CORI Data API
Overview
The cori-data-api
repository (as distinct from the cori.data.api
package) houses all the code to:
- Create a RestApi with Python Lambdas connecting to a PostgreSQL Database.
- Create a GraphQL Api with Typescript Lambdas connecting to the Python RestAPI (as a DataSource).
- Create a CICD Pipeline that connects to this Github repository, is triggered on commits/PRs to specific branches, and re-builds and re-deploys the two updated APIs.
Architecture
External Dependencies
This project has two external dependencies:
- PostgreSQL database (Amazon RDS)
- Redis Cache (Hosted on Redis Cloud)
You can safely customize and update these services in their respective interfaces. All other updates to this project should be handled IN CODE and deployed through the CICD Pipeline by committing to a specific branch.
Environment Setup
Requirements
- NodeJS 16.x+ - Installing NodeJS
- npm 8.x+ - (needed for NPM Workspaces) - (should be installed as part of NodeJS installation)
- AWS CLI - Installing AWS CLI
- AWS SAM CLI Installing SAM CLI
- AWS CDK V2 - Installing AWS CDK V2
- Python 3.9+ - Installing Python
NodeJS 16.x+ and NPM 8.x+
This project uses NPM Workspaces to managing multiple packages from your local file system from within a singular top-level, root package. NPM workspaces requires NPM version 8+. This should be installed as part of the installation of NodeJS 16.x+
AWS CLI
The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. With minimal configuration, the AWS CLI enables you to start running commands that implement functionality equivalent to that provided by the browser-based AWS Management Console from the command prompt in your terminal program:
AWS SAM CLI
AWS SAM provides you with a command line tool, the AWS SAM CLI, that makes it easy for you to create and manage serverless applications. You need to install and configure a few things in order to use the AWS SAM CLI.
AWS CDK V2
The AWS CDK lets you build reliable, scalable, cost-effective applications in the cloud with the considerable expressive power of a programming language. This approach yields many benefits, including:
Build with high-level constructs that automatically provide sensible, secure defaults for your AWS resources, defining more infrastructure with less code.
Use programming idioms like parameters, conditionals, loops, composition, and inheritance to model your system design from building blocks provided by AWS and others.
Put your infrastructure, application code, and configuration all in one place, ensuring that at every milestone you have a complete, cloud-deployable system.
Employ software engineering practices such as code reviews, unit tests, and source control to make your infrastructure more robust.
Connect your AWS resources together (even across stacks) and grant permissions using simple, intent-oriented APIs.
Import existing AWS CloudFormation templates to give your resources a CDK API.
Use the power of AWS CloudFormation to perform infrastructure deployments predictably and repeatedly, with rollback on error.
Easily share infrastructure design patterns among teams within your organization or even with the public.
Python 3.9+
Additional Suggested Environment Setup
- A Node Version Manager Installing NVM or Installing n
Installation and Development
Getting started
Clone the repo
git clone https://github.com/ruralinnovation/cori-data-api.git
Change into project directory
cd cori-data-api
Install libraries for all packages
npm install
Set local environment (shell) varibles:
$ export INTEGRATION_TESTING_USERNAME=<aws-cognito-username>
$ export INTEGRATION_TESTING_PASSWORD=<aws-cognito-pasword>
At this point, you should be able to build the api and run the test suite:
$ npm run build
$ npm run test
Working with NPM Workspaces
We suggest reading through the NPM Workspaces documentation before attempting to install any new packages or work with this repository.
Workspaces allows you to organize your code in a mono-repo with multiple packages (projects). There is a shared dependency tree to reduce build time and redundant packages.
Project Structure
Directory Structure
Overview
.github
- Configuration for Github Actions.vscode
- VSCode Configuration for local debuggingdocs
- Documentation resourcespostgresql
- supportive database scripts and documentationpackages/infrastructure
- Typescript/NodeJS Infrastructure (CDK) codepackages/graphql-schemas
- Typescript/NodeJS GraphQL (Lambda) Codepackages/python-lambdas
- Python Lambdas, Business Logic and Code
infrastructure
graphql-schemas
python-lambdas
This project uses NPM Workspaces to managing multiple packages from your local file system from within a singular top-level, root package.
For more information please READ THE DOCS
Database Integration
The main data source for the APIs is a PostgreSQL RDS hosted in AWS. The Python RestAPI is the only integration point to the database. All endpoints provisioned in the Python RestAPI are served by Python lambdas that house the database queries, and transformations.
The Python lambdas are located in the same VPC as the PostgreSQL RDS database. These lambdas have no connection to the public internet to protect the database. In addition, the RestAPI is a READ_ONLY API, which allows us to create a READ_ONLY database user/role for the lambdas to use for database access, which limits security risks.
Prerequisites
User/Role Setup
Create a READ_ONLY user
- Log in to database as admin with psql
- Create read only role and new user with:
CREATE ROLE read_only_access;
GRANT CONNECT ON DATABASE (DB_NAME} TO read_only_access;
GRANT USAGE ON SCHEMA public TO read_only_access;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only_access;
GRANT SELECT ON ALL TABLES IN SCHEMA bcat TO read_only_access;
CREATE USER read_only_user WITH PASSWORD ________________;
GRANT read_only_access TO read_only_user;
- Keep note of password for next step
Save database credentials in AWS Parameter Store
- Log into AWS Console
- Go to Systems Manager/Parameter Store
- Save the database password in a parameter with a prefix e.g.
/cori/api/password
- Keep note of the parameter name
Update Api Configuration File
The ApiStackProps
has an databaseConfig
attribute with the following schema:
interface DatabaseConfig {
: string;
vpcId: string;
databaseSecurityGroupId: string;
host: string;
dbname: string;
dbuser: string;
parameterName }
Open the main configuration file for your environment in config/config.ts
. Update the attributes values with your database information.
The dbuser
attribute value should be read_only_user
if that is how you configured the previous section.
The paramaterName
attribute value should be the name of the password parameter you stores in AWS Parameter Store in the previous step. (e.g. /cori/api/password
)
Redis Cache Integration
todo
Prerequisites
User/Role Setup
todo
Update Api Configuration
todo
Local Development and Testing
You can test your python business logic with the local development server. In order to bypass the use of the Lambda Layers (which don’t play nice with local servers), there is a lambda dedicated to local testing located in the python-lambdas/local
directory.
The index.py
file houses the lambda handler that is triggered by all local api queries. You can copy your business logic into this lambda to test out functionality. All queries should be prefixed with /local
:
The local server url is http://localhost:2000/local/
Starting the Local Server
npm run start
CICD Pipeline
The CICD pipeline has been deployed in your AWS Account and re-builds/re-deploys will be triggered by PRs to the dev
and main
branches in your github repository.
CDK Pipelines CDK Pipelines Workshops
CICD Setup
Github Setup
- Create a new user in Github for CICD
- Create a Personal Access Token for this user
- Store the Personal Access Token in AWS Secrets Manager with the name
github-token
Pipeline Infrastructure
The Pipeline is configured using CDK Pipeline construct.
CDK Pipelines is an opinionated construct library. It is purpose-built to deploy one or more copies of your CDK applications using CloudFormation with a minimal amount of effort on your part. It is not intended to support arbitrary deployment pipelines, and very specifically it is not built to use CodeDeploy to applications to instances, or deploy your custom-built ECR images to an ECS cluster directly: use CDK file assets with CloudFormation Init for instances, or CDK container assets for ECS clusters instead.
Pipeline code
The Pipeline code is located in the packages/infrastructure/src/stacks/PipelineStack.ts
file.
In order to be deployed this stack requires configuration parameters for connecting to Github as well as deploying the ApiStack.
interface PipelineStackProps {
/**
* GitHub source configuration
*/
source: {
/**
* Case-sensitive GitHub repo name
* i.e. mergingfutures/cori-data-api
*/
repo: string;
/**
* Which branch to listen on
* When changes are committed, the pipeline will trigger
*/
branch: string;
/**
* Personal access token for authentication
* i.e. cdk.SecretValue.secretsManager('mergingfutures-pat')
*/
authentication: SecretValue;
/**
* How to trigger the pipeline.
* Must have admin access on repo to use WEBHOOK.
* Only read access is required for POLL
*/
trigger?: GitHubTrigger;
};
/**
* ~~Use this to re-use an existing S3 bucket.~~
*
* TODO: Does this need to be automatically configured?!
* ANSWER: YES!
*
* Apparently the "artifact" bucket is one that must be created/configured/managed by AWS because of some system tag that it adds to the bucket, which humans are not allowed to use...
* ```
* Unknown Error
* An unexpected error occurred.
* API response
* System tags cannot be added/updated by requester
* ```
* <img width="844" alt="image" src="https://github.com/user-attachments/assets/ab4732dd-ebca-44a7-9648-277500b0a263">
*
*/
artifactBucketName?: string; // <= DO NOT SET VALUE!
/**
* Configures the api to be deployed by the pipeline
*/
ApiConfig: ApiStackProps;
/**
* Credentials for Integration Testing
*/
integrationConfig: {
userName: string;
password: string;
};
}
Pipeline deployment
Each pipeline is associated with a branch and an environment.
We have setup a DEV and PROD Pipeline for you, but you can have other pipelines connected to other branches as well.
- Check out the associated branch.
- Ensure the branch code is pushed to the remote repo
- Create a entry for the branch in
config/configs
Bootstrapping
In the main pipeline account
npm run bootstrap:pipeline -- aws://{ACCOUNT-NUMBER}/{REGION} [--profile {PROFILE}]
In any other accounts (if using cross-account deploy)
npm run bootstrap:pipeline -- aws://{ACCOUNT-NUMBER}/{REGION} --trust {PIPELINE-ACCOUNT-NUMBER} [--profile {PROFILE}]
Deploy the Pipeline
Each pipeline is associated with a branch and an environment.
Check out the associated branch.
Create a entry for the branch in
config/configs
Deploy the pipeline
cd packages/infrastructure npm run deploy:pipeline -- [--profile {PROFILE}]
Once deployed, the pipeline will trigger on new commits to the associated branch.
Python Microservices
The Python Microservices are located in the python-lambdas
directory.
Directory Structure
├── dependency-layer # Shared dependency/libraries layer
├── bcat # BCAT Service
├── local # Local Development service
└── scaffolding # Scaffolding for new service (See Creating New Service Section)
Python Dependency Layer
Lambda layers provide a convenient way to package libraries and other dependencies that you can use with your Lambda functions. Using layers reduces the size of uploaded deployment archives and makes it faster to deploy your code.
A layer is a .zip file archive that can contain additional code or data. A layer can contain libraries, a custom runtime, data, or configuration files. Layers promote code sharing and separation of responsibilities so that you can iterate faster on writing business logic.
You can use layers only with Lambda functions deployed as a .zip file archive. For functions defined as a container image, you package your preferred runtime and all code dependencies when you create the container image. For more information, see Working with Lambda layers and extensions in container images on the AWS Compute Blog.
You can create layers using the Lambda console, the Lambda API, AWS CloudFormation, or the AWS Serverless Application Model (AWS SAM). For more information about creating layers with AWS SAM, see Working with layers in the AWS Serverless Application Model Developer Guide.
Creating and sharing Lambda Layers
It is import that you only include packages in your layer that a majority of the lambdas will use, as redundant libraries will increase container start time and reduce performance.
You are not limited to using a single layer in a lambda and can include up to 5 layers for each individual lambda. As the API grows you may find creating an assortment of dependency layers specific to certain typological functions is necessary. The total unzipped size of the function and all layers cannot exceed the unzipped deployment package size quota of 250 MB.
The dependencies for the Python Microservices are packaged and zipped in the packages/python-lambdas/dependency-layer
directory.
This zipped filed is then deployed as a lambda layer dependency for all python lambdas. This will cut down on container start time by sharing these resources across many lambdas.
Included Packages
- psycopg = “^3.0.14”
- psycopg-binary = “^3.0.14”
- aws-lambda-powertools = “^1.26.0”
Psycopg & Psycopg-Binary
Psycopg is the most popular PostgreSQL adapter for the Python programming language. Its core is a complete implementation of the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.
AWS Python Lambda Powertools
We leverage AWS Lambda Powertools library, which is a suite of utilities for AWS Lambda functions to ease adopting best practices such as tracing, structured logging, custom metrics, idempotency, batching, and more.
Check out this detailed blog post with a practical example.
In our experience it has a developer friendly (Flask-like) Api. It makes it very easy to configure routing/endpoints in each Python Microservice.
For more information READ THE DOCS
Adding New Dependencies to the Layer
# Change into the Python Microservices directory
cd packages/python-lambdas
# Activate the Python environment
source .env/bin/activate
# Add new packages
pip install package1, package2
# Copy Packages directory into the Dist directory
cp -r ./.env/lib/python3.8/site-packages ./dist/python
# Zip up the dist directory packages
...
Creating New Services
- Copy/paste scaffolding directory
- Rename service directory and the name in the
pyproject.toml
file - Update
index.py
with custom endpoints and logic. - Create a new ApiEndpoint in the ApiStack with the new service as handler for the endpoints. (see Creating a new Api Endpoint)
- Push changes to current branch to re-deploy with pipeline.
- Check Integration Tests Step in
CodePipeline
interface.
Api Stack & Supporting Constructs
The Core ApiStack is composed of:
- Networking Construct
- Authentication (Construct) Construct
- Python Data Server Construct
- Apollo (GraphQL) Server Construct
- Hosting (CloudFront) Construct
Constructs are located in the packages/infrastructure/src/constructs
Core ApiStack
The Root Stack for the entire API project is the ApiStack
.
This stack is located at: packages/infrastructure/src/stacks/ApiStack.ts
Familiarize yourself with the ApiStackProps
specified at the top of the file. These props are all the required and optional parameters that drive the configuration and deployment of the two APis and supporting lambdas, the networking, the hosting and the authentication.
export interface DatabaseConfig {
: string;
vpcId: string;
databaseSecurityGroupId: string;
host: string;
dbname: string;
dbuser: string;
parameterName
}export interface CacheConfig {
: string;
host: number;
port: string;
username: string;
parameterName: string;
globalTTL
}
interface AppSyncUserPoolConfig {
: string;
userPoolId
}
export interface ServiceConfig {
/**
* The Logical Name of the service (NO SPACES) e.g. BCATService
*/
: string;
logicalName/**
* The Core path to trigger the Microservice e.g. /bcat
*/
: string;
corePath/**
* The name of the directory this service is located. e.g. bcat
*/
: string;
directoryName
}
interface AppSyncConfig {
/**
* Optional: When provided will configure additional user pools in the app sync authorization configuration
*/
: AppSyncUserPoolConfig[];
additionalUserPools
}
export interface ApiStackProps extends StackProps {
: {
env: string;
account: string;
region;
}: string;
client: string;
stage: string;
project
: string;
loggingLevel
/**
* Retain Dynamo Table and UserPool on delete
*/
: boolean;
retain
/**
* Database integration configuration
* Puts lambdas in VPC. Expecting VPC to be in another stack or deployed already.
* DB creds are accessed through parameter store and deployed as part of the lambda service environment.
*
*/
: DatabaseConfig;
databaseConfig
/**
*
*/
: boolean;
cacheEnabled: CacheConfig;
cacheConfig
/**
* Optional. When provided, will attach to existing Cognito for authentication.
*/
?: ExistingCognitoConfig;
existingCognito
: ServiceConfig[];
microservicesConfig }
Custom Constructs
Networking Construct
The Networking Construct is responsible for creating a new Security Group
for all of the Python Lambdas, and enabling communication between this new Lambda Security Group and the existing Database Security Group. This construct accepts the DatabaseConfig
in order to instantiate the required resources.
All Python Lambdas are placed within your existing VPC and have no connection to the public internet.
The only INGRESS communication allowed to these lambdas are from Api Gateway on Core AWS Network.
The only EGRESS communication allowed from these lambdas is to the PostgreSQL Database and this is opened from the link between the two security groups.
This eliminates any security risks from the public internet.
AWS CDK V2 Constructs Used
More Information
Authentication (Cognito) Construct
The Authentication (Cognito) Construct is responsible for one of two things:
If you have an existing Cognito UserPool you want to use as a directory for controlling access to the APIs, this construct imports a reference to that UserPool and adds a new Postman client for development testing.
OR
If you don’t have an existing UserPool, this construct creates a new UserPool, adds a new Authentication domain, and then adds the Postman client for development testing.
This construct is then passed into the ApiGateways in both servers (PythonDataServer & ApolloServer) for attaching CognitoAuthorizers as access controls.
AWS CDK V2 Constructs Used
More Information
Python Data Server Construct
The Python Data Server Construct is responsible for:
- Create the
ApiGateway
for your Python Microservices. - Deploy the Dependency Layer
- Deploy new Python Microservices.
- Create Endpoints on the
ApiGateway
to trigger new Microservices.
Custom CDK V2 Constructs Used
Creates Api Gateway with Cognito Authorization. Has supporting methods for adding new endpoints with lambda triggers.
Create a new Python Lambda (microservice) with an associated Log Group
AWS CDK V2 Constructs Used
More Information
Apollo (GraphQL) Server Construct
The Apollo GraphQL Server Construct is responsible for:
- Create the
ApiGateway
for your GraphQL Server. - Deploy a single NodeJS Lambda function to respond to GraphQL requests.
- Create Endpoint on the
ApiGateway
to trigger lambda.
Custom CDK V2 Constructs Used
Creates Api Gateway with Cognito Authorization. Has supporting methods for adding new endpoints with lambda triggers.
AWS CDK V2 Constructs Used
More Information
Hosting (CloudFront) Construct
The Hosting Construct
is responsible for:
- Create a Cloudfront Distribution.
- Create origin on
/
path for Python RestApi - Create origin on
/graphql
path for Apollo Server. - Create bucket for access logs.
AWS CDK V2 Constructs Used
More Information
Resources
CDK Day 2020 - Building Real-time Back Ends on AWS with AppSync and CDK
Sharing DB Snapshot between Accounts Sharing KMS KEY