Building a serverless app with AWS SAM

6 minute read #serverless #aws sam #influxdb

In this post I’ll be writing about my experience building a serverless app using AWS SAM.

The app is called sns-s3-influxdb. It transfers newly fetched air quality measurements from OpenAQ’s services to an external time series database.

Essential AWS terms #

It helps to know about the common AWS services to follow this post. Here’s some brief definitions of technologies involved.

I’ll be also mentioning InfluxDB here, you can learn about the database in my previous post.

What the app does #

The app doesn’t try to do a lot. It reads incoming SNS messages, which describes what S3 object was created. We read that S3 object, convert what’s in there to a special format, then feed the data into a special database. It’s sort of a pipeline component than an app, really.

Concretely, OpenAQ publishes SNS events when there are new fetch data is available. I wanted to get my hands on those new data, and feed it to a time series database.

SNS messaging #

Let’s take a look what the incoming SNS messages contain.

The main event message has the following structure (details omitted):

{"Records": [
	{"EventSource": "aws:sns", 
    "Sns": {
        "Type": "Notification", 
        "TopicArn": "arn:aws:sns:us-east-1: 470049585876:NewFetchObject", 
        "Subject": "Amazon S3 Notification", 
        "Message": "THE-ACTUAL-MESSAGE-AS-JSON-STRING",
	  }}
]}

The message content is a JSON string with roughly the same structure (details omitted):

{"Records": [
    {
        "eventSource": "aws:s3",
        "eventName": "ObjectCreated:CompleteMultipartUpload",
        "s3": {
            "configurationId": "NewFetchObject",
            "bucket": {
                "name": "openaq-fetches",
                "arn": "arn:aws:s3:::openaq-fetches"
            },
            "object": {
                "key": "realtime/2020-06-11/1591870866.ndjson",
                "size": 1466820,
            }
        }
    }
]}

The app workflow #

Now that I know more about the message contents, I devise the app workflow:

Or in short:

SNS → Lambda → Get S3 object → Format data → Write to InfluxDB

Building the app #

Upon starting with sam init, you have the option to select from a sample hello-world app or a custom template.

I got started with the hello-world app and adapted from there. You can start from existing templates such as sns-message-python or use them as references.

I replaced the API event that was in the hello-world app with SNS.

Testing locally #

I started testing locally with sam local invoke. Since I don’t know how to artificially trigger the SNS event1, I downloaded a hardcoded S3 object to check that the function was working as expected.

Packaging Python dependencies #

This part wasn’t completely obvious from the documentation, but I ended up having a requirements.txt file in the function directory, and that worked.

Make it configurable with environment variables #

I made the deployment configurable so that the InfluxDB settings can be read from the environment variables.

At first, I set the variables in the Lambda function configuration page in AWS Console. That works but, redeploying the app wipes the variables.

You could define them in the app template, but that doesn’t work for deployment (it should). It’s a known issue with a workaround.

You setup the environment variables for the function in the template. This part is expected. The workaround is to point those variables to their template parameters. And at deployment time, the parameters are provided.

Build #

In my case, sam build was failing with some error about dependency resolution. I used the option to build in a container:

$ sam build --use-container
Starting Build inside a container
Building function 'WriteToInfluxDBFunction'

Fetching lambci/lambda:build-python3.7 Docker container image......
Mounting /Users/dulguun/source/sns-s3-influxdb/write_to_influxdb as /tmp/samcli/source:ro,delegated inside runtime container

Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Deploy: sam deploy --guided
    
Running PythonPipBuilder:ResolveDependencies
Running PythonPipBuilder:CopySource

The build outputs are in .aws-sam/build directory. You can see that the app module and its dependencies are there:

$ tree -L 3 .aws-sam
.aws-sam
└── build
    ├── WriteToInfluxDBFunction
    │   ├── __init__.py
    │   ├── app.py
    │   ├── boto3
    │   ├── boto3-1.14.12.dist-info
    │   ...
    └── template.yaml

34 directories, 7 files

Deployment #

sam deploy takes care of several things:

The full deploy command with parameters is:

sam deploy --guided --parameter-overrides \
    InfluxDBUrl="https://eu-central-1-1.aws.cloud2.influxdata.com" \
    InfluxDBBucketName=YOUR_BUCKET_NAME \
    InfluxDBOrg=YOUR_ORG_ID \
    InfluxDBToken=YOUR_BUCKET_TOKEN \
    InfluxDBMeasurementName=YOUR_MEASUREMENT_NAME

On the first deploy, you need the --guided option. This will create a local samconfig.toml file that’s used on subsequent deploys.

If for some reason the stack creation fails, the only way is to retry. There will be the leftover failed stack you need to cleanup manually.

You can keep an eye on the stack status from CLI too:

aws cloudformation list-stacks

When successful, the stack will have the CREATE_COMPLETE status.

Tailing the log #

After deployment, we wait for real SNS events to occur. Here’s how to tail the log from the command line:

sam logs -n WriteToInfluxDBFunction --stack-name write-openaq-fetches-to-influxdb -t

Disable retries #

In the debugging phase, I recommend that you set the function retries to zero.

Configure permissions #

ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

Your function needs permission to read from S3. You can add it through the console, but it’s better to have it in the template.

Debugging InfluxDB #

Now all is well on the Lambda side.

I did spent some time debugging the InfluxDB write operation, but I won’t get into too much detail here.

Briefly: I was getting Bad Request responses from InfluxDB. I changed a couple of things:

And the whole pipeline works now! I have incoming data in InfluxDB via Lambda via S3 via SNS!

You can see the app repository at sns-s3-influxdb.

Conclusion #

The sam CLI is pretty useful and takes you most of the way. It did require some professional duckduckgoing on the CloudFormation templating part, but I wasn’t very familiar with it, so that’s expected.

The documentation is useful, but I felt like some details like dependency packaging were missing. You need to try some things to see what sticks.

I can see myself using AWS CDK for my next project (didn’t know about it before building this app!). Instead of YAML, you use a programming language of your choice to define the cloud resources you need. And at a glance, it seems to have much better documentation and support.


  1. If you have the correct event JSON file, you can use it with sam local invoke --event event.json. More on that. ↩︎