My journey with AWS Serverless

Tips and lesson learned about AWS Lambda and Dynamo DB in last year

Image for post
Image for post
background photo source: https://unsplash.com/@withluke
Image for post
Image for post
You can scale your Lambda in many different ways, eg: a) start all possible instances, b) scale up 60 additional instances per minute to a maximum of 1,000 concurrent invocations (with SQS), c) set provisioned concurrency to always have min. number of instances running (to prevent cold starts), d) with reserved concurrency (to do not run more than max number of instances).

Async architecture with AWS Lambda

Split your logic

The tricky part is to proper organise architecture — split logic into multiple functions and transfer data between them. That splitting makes sense especially if one part of logic is invoked more often than other, or if one part is more complicated, takes more time / memory to compute. Another reason could be, that for one function you would like to have concurrency limit— for example to avoid too many requests to external API or DB (throttling issues).

Know the difference between sync and async Lambda

You can invoke Lambda synchronously or asynchronously. If your logic is simple and you expect direct response from your function, invoke Lambda synchronously (InvocationType = RequestResponse). For example Lambda calls via API Gateway or Elastic Load Balancer are synchronous, as you expect response.

Image for post
Image for post
Within AWS Step Functions you can orchestrate event driven application // source: AWS
Image for post
Image for post
On failure destination gives a bit more option than dead-letter queue // source: AWS

Think about multiple ways of triggering Lambda

You can trigger Lambdas by events emitted from different sources. List of possible event sources is long.

Image for post
Image for post
Apart from AWS services you can integrate your lambda with partners event sources (via EventBridge) // source: AWS
  • SQS — queues (pull) are ideal solution when you expect throttling problem — for example if your traffic is very dynamic and you would like to avoid to lose any messages, or if you want to optimize Lambda autoscaling (60 additional instances per minute to a maximum of 1,000 concurrent invocations);
  • EventBridge (CloudWatch Events) — used for more complex events management, where you can filter by event patterns, subscribe to scheduled job (cron), 3rd party emitters and more — such as communication between accounts;
  • Kinesis — dedicated for streaming or data driven applications;
  • S3 — you can trigger Lambda based on changes in S3 bucket eg. one service upload file, other service does some operation on it;
  • DynamoDB — Lambda can read records from DB stream, so you can react each time, when data changes;
Image for post
Image for post
Cheatsheet for choosing async event sources for Lambda // source: @theburningmonk

Get some inspirations from others

There are many different ways to deal with events in AWS, so I encourage you to read about how others build theirs async architecture. Great source of patterns and solutions is AWS Solutions Reference Architectures and cdkpatterns.com.

Image for post
Image for post
The Big Fan is a popular pattern based on SNS and SQS (filtering to fan out events to different consumers) // source: cdkpatterns.com

Understand Lambda execution to improve speed

When you are Node.js developer, cold starts (increased invocation latency) might not be your main issue. If Lambda does a lot of work eg. soon after invocation (making connection to DB, retrieving data from SSM or so), you might want to improve that speed — especially if that could improve UX.

Image for post
Image for post
The lifecycle of the execution environment — there could be many invocations within the same runtime, so you might cache some data in global Node.js scope // source: AWS

Use Lambda layers

You might be in situation, that one service share a lot of code with others. In Node.js you can have some common node_modules and it might not be good idea to include all of them in each bundle, or to deploy to each Lambda container separately. Think about deploy common code as AWS Lambda layer. It could be useful also if some part of your Lambdas code is heavy, and you would like to avoid deploying it each time, when you change something.

DynamoDB — performance and throttling

Learn about difference between DynamoDB and relational DB

Proper understanding NoSQL database is key to write performant queries. DynamoDB has been created to be fast and scalable. For someone who came from SQL world into DynamoDB, it might me hard to change the way of thinking about data schemas. I recommend read Alex DeBrie’s DynamoDB guide (dynamodbguide.com), blog or buy “The DynamoDB Book” (dynamodbbook.com), which gives you deep knowledge about that database plus some great pattern to work with DynamoDB. Good summary of Dynamo DB you can find in this article.

Think about access patterns

DynamoDB is fast, because you access data directly from partition (you always need to know partition key and optionally sort key). When you query for more than one item, it’s important to have proper keys design. One key can be composition of different data. It’s common practice to give simple name to keys (as PK, SK) and keep different types of data in the same table (single table design). You can different that data types by adding prefix to your key followed by hash, such as TYPE#123. Because data in DynamoDB are organized in B-trees, you should also think about, how you would like to sort / filter it, and then design your sort key accordingly. If you would like to access your data in couple different ways, consider using GSI or LSI (global / local secondary index).

Image for post
Image for post
Single table design is a great pattern to deal with Dynamo DB // source: alexdebrie.com

Keep in mind DynamoDB limits

DynamoDB has some limits, which you should always keep in mind:

  • max result size for Scan/Query operation is 1MB
  • batch get: max 25 items, 16MB; batch put: max 100 items, 16MB
  • 3000 RCU (Read Capacity Units) and 1000 WCU (Write Capacity Units) throughput limit per partition
  • 40 000 RCU and WCU default limit per table, 80 000 per account
  • max 10GB per collection with the same partition key (doesn’t include global secondary indexes)

Distribute traffic

If your application has big traffic or operate on large payloads, you might hit throughput limits. That’s why it’s important to have proper design for your partition keys, to avoid situation, when one partition has much more traffic than others (“hot” partition key). If you struggle problems with proper design, you can try distribute writes across more partitions by sharding.

Handle errors

DynamoDB should be reliable, especially if you’re using Global Tables, but don’t forget about handling errors — log them, think about retry logic and set DLQ for your Lambda.

Others

Do not forget about monitoring

If you are Node.js developer, you might be interested in pino logger. It has a lot of helpful functionalities, such as redaction (if you would like to avoid showing sensitive data in your logs), levels (eg. to do not show debug logs on prod) or child loggers — to add common data for each logs.

Image for post
Image for post
Creating charts in 3rd party solutions might be easier // source: New Relic Documentation

Do not overcomplicate your stack

Your services can grow quickly. You might think to put all functions / resources definitions into one CloudFormation stack (eg. in single serverless.yaml /Serverless framework). As this sounds good for few functions (which together delivers single feature), in future it could make you some troubles.

  • splitting architecture into multiple stacks might improve you build / deploy process;
  • keep you global / common resources (database, queues, topics) out of stacks, which you often deploy;
  • if you really need to keep all resources in single CoudFormation stack, think about nested stacks.

Choose the best framework for you

If you are new in Serverless and want to create something simple, think about AWS Amplify.

Image for post
Image for post
Common practice is to use Terraform to manage shared infrastructure & Serverless or SAM to deploy apps. AWS Amplify is interesting for beginners and for simple or mobile apps. You can also try AWS SDK without touching any framework.

Listen podcasts, read newsletters, prepare for certificate

My favorite resource is Off-by-none newsletter + podcasts: AWS Podcast, Serverless Chats, Real World Serverless.

Written by

Full-stack developer - react.js, node.js, serverless. @machnicki

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store