My journey with AWS Serverless
Tips and lesson learned about AWS Lambda and Dynamo DB in last year
Serverless is a revolution for interpreted, single threaded languages such as Node.js or Python. JavaScript engineers have got possibility to write scalable and fast architecture without going too deep into multi threaded issues. Initialisation of Node.js and Python processes is fast, which makes them ideal choice to use for AWS Lambda. Functions can be run in many instances in parallel with autoscaling (based on real traffic). You can control that and eg. prevent spinning up too many instances (reserved concurrency) or set min. instances available to prevent cold starts (provisioned concurrency). One part of your application can be run eg. in 200 concurrent processes, when other in 10. And you pay only for what you’re using.
Here are some notes and tips, which might be helpful, if you are starting your journey with AWS Lambda and Dynamo DB.
Async architecture with AWS Lambda
Split your logic
The tricky part is to proper organise architecture — split logic into multiple functions and transfer data between them. That splitting makes sense especially if one part of logic is invoked more often than other, or if one part is more complicated, takes more time / memory to compute. Another reason could be, that for one function you would like to have concurrency limit— for example to avoid too many requests to external API or DB (throttling issues).
Keep in mind, that Lambda initialisation can take some time (not so much as compiled languages, though), so it might not be necessary to split logic into Lambdas, which computation time takes less < 1s.
Know the difference between sync and async Lambda
You can invoke Lambda synchronously or asynchronously. If your logic is simple and you expect direct response from your function, invoke Lambda synchronously (InvocationType = RequestResponse). For example Lambda calls via API Gateway or Elastic Load Balancer are synchronous, as you expect response.
For more complicated logic it’s recommend to split it between functions and invoke them asynchronously (InvocationType = Event). If you would like to call Lambda from other Lambda synchronously — keep in mind, that you may pay twice, as time of both Lambdas will be billed separately.
Example of async / event driven architecture: one Lambda sends message to topic (SNS), or Event Bus (EventBridge), or for example updates record in database, and another Lambda is subscribed to that specific event / stream. Communication in that case is event driven - there is no other Lambda waiting for response as it could be in sync model.
You can orchestrate logic using AWS Step Functions, so your event driven application will be easier to maintain. If you would like to orchestrate async workflow, but expect sync response after workflow will be finished (eg. by API Gateway) — consider Synchronous Express Workflow. For some applications AWS Step Functions might be too costly because of double-billing issue, so use it carefully. Sometimes it might be better to stick with EventBridge or Lambda destinations instead.
Async Lambda has its own event queue (don’t confuse with SQS), so you can implement retry logic in case event won’t be delivered (max 2 retries) and it can keep events for max. 6 hours, eg. when function doesn’t have enough capacity to handle all incoming requests (throttling errors). Keep in mind that event queue is eventually consistent (event can be sometimes delivered more than once). If you expect throttling issues, consider implementing SQS queue (instead of relying on event queue) to have guaranteed, that all events will be delivered.
Do not forget about handling errors — you can set SQS queue or SNS topic as dead letter queue or as an on failure destination. In first case you will have access to all discarded events, in second case you will have access to events, but also to errors responses.
Think about multiple ways of triggering Lambda
You can trigger Lambdas by events emitted from different sources. List of possible event sources is long.
The most popular ways to trigger Lambda are:
- SNS — pub/sub (push) functionality; one or many Lambdas send message to topic, which others Lambdas/services can be subscribed to;
- SQS — queues (pull) are ideal solution when you expect throttling problem — for example if your traffic is very dynamic and you would like to avoid to lose any messages, or if you want to optimize Lambda autoscaling (60 additional instances per minute to a maximum of 1,000 concurrent invocations);
- EventBridge (CloudWatch Events) — used for more complex events management, where you can filter by event patterns, subscribe to scheduled job (cron), 3rd party emitters and more — such as communication between accounts;
- Kinesis — dedicated for streaming or data driven applications;
- S3 — you can trigger Lambda based on changes in S3 bucket eg. one service upload file, other service does some operation on it;
- DynamoDB — Lambda can read records from DB stream, so you can react each time, when data changes;
By default SNS and SQS don’t guarantee, that messages will be delivered in the same order, as they were published. Occasionally there could be situation that messages will be delivered more than once, so you might need deduplication mechanism. If you would like to prevent duplicate messages from delivered and need order to be guaranteed, think about SNS FIFO or SQS FIFO.
Get some inspirations from others
There are many different ways to deal with events in AWS, so I encourage you to read about how others build theirs async architecture. Great source of patterns and solutions is AWS Solutions Reference Architectures and cdkpatterns.com.
Understand Lambda execution to improve speed
When you are Node.js developer, cold starts (increased invocation latency) might not be your main issue. If Lambda does a lot of work eg. soon after invocation (making connection to DB, retrieving data from SSM or so), you might want to improve that speed — especially if that could improve UX.
Good idea is to cache some data in memory, so it can be used in another invocation within the same execution environment (runtime) later. You can play with Middy middleware (cache, ssm packages) or memoize some pure functions eg. with Memoizee.
Think about provisioned concurrency (requested number of execution environments will be always prepared to respond to your function’s invocations), but be careful — it might be too much costly for some cases. If your traffic is very dynamic, you can manage your provision concurrency within Application Auto Scaling.
Learn about Lambda execution environment lifecycle. It might be good idea to run some logic in Init phase — out of the handler, so it can be run after starting runtime, eg. by provisioned concurrency, so before invocation. You can put into Init phase synchronous code, up to 10 seconds to compute. Async code in Node.js (such us loading SSM by SDK) might not be finished in Init phase (will be frozen and resume in first invocation). If you would like to finish loading data (eg. from SSM or DB) in Init phase, try Python.
Problem of async code in Node.js is not limited do Init phase only. If you run async code in Lambda, make sure that it’s finished before you will return response / resolve handler. The best way is to always keep async logic in async-await promises instead of callbacks. Callback pattern (such as using setTimeout) can lead to execution leaks (problem, when executing code runs in a different invocation than the original execution context).
Use Lambda layers
You might be in situation, that one service share a lot of code with others. In Node.js you can have some common node_modules and it might not be good idea to include all of them in each bundle, or to deploy to each Lambda container separately. Think about deploy common code as AWS Lambda layer. It could be useful also if some part of your Lambdas code is heavy, and you would like to avoid deploying it each time, when you change something.
DynamoDB — performance and throttling
Learn about difference between DynamoDB and relational DB
Proper understanding NoSQL database is key to write performant queries. DynamoDB has been created to be fast and scalable. For someone who came from SQL world into DynamoDB, it might me hard to change the way of thinking about data schemas. I recommend read Alex DeBrie’s DynamoDB guide (dynamodbguide.com), blog or buy “The DynamoDB Book” (dynamodbbook.com), which gives you deep knowledge about that database plus some great pattern to work with DynamoDB. Good summary of Dynamo DB you can find in this article.
Think about access patterns
DynamoDB is fast, because you access data directly from partition (you always need to know partition key and optionally sort key). When you query for more than one item, it’s important to have proper keys design. One key can be composition of different data. It’s common practice to give simple name to keys (as PK, SK) and keep different types of data in the same table (single table design). You can different that data types by adding prefix to your key followed by hash, such as TYPE#123. Because data in DynamoDB are organized in B-trees, you should also think about, how you would like to sort / filter it, and then design your sort key accordingly. If you would like to access your data in couple different ways, consider using GSI or LSI (global / local secondary index).
Keep in mind DynamoDB limits
DynamoDB has some limits, which you should always keep in mind:
- max size for single item is 400KB
- max result size for Scan/Query operation is 1MB
- batch get: max 25 items, 16MB; batch put: max 100 items, 16MB
- 3000 RCU (Read Capacity Units) and 1000 WCU (Write Capacity Units) throughput limit per partition
- 40 000 RCU and WCU default limit per table, 80 000 per account
- max 10GB per collection with the same partition key (doesn’t include global secondary indexes)
Distribute traffic
If your application has big traffic or operate on large payloads, you might hit throughput limits. That’s why it’s important to have proper design for your partition keys, to avoid situation, when one partition has much more traffic than others (“hot” partition key). If you struggle problems with proper design, you can try distribute writes across more partitions by sharding.
Handle errors
DynamoDB should be reliable, especially if you’re using Global Tables, but don’t forget about handling errors — log them, think about retry logic and set DLQ for your Lambda.
If you’re using AWS SDK, you will have retry and exponential backoff algorithms set by default. It is useful, as SDK will retry request in case of limits /throttling issues or 5xx errors.
Situation is a bit more complicated, when you are using batch methods (BatchGetItem, BatchWriteItem). As some operations can failed (eg. because of throttling issues) and some succeed in the same request - you will receive 200 response and SDK won’t retry that request. You will need to write your own retry / backoff mechanism to handle UnprocessedKeys / UnprocessedItems if there any in response.
Others
Do not forget about monitoring
If you are Node.js developer, you might be interested in pino logger. It has a lot of helpful functionalities, such as redaction (if you would like to avoid showing sensitive data in your logs), levels (eg. to do not show debug logs on prod) or child loggers — to add common data for each logs.
Proper logging (eg. payloads, ids, states, errors) will help you debug your code (eg. building queries in CloudWatch Logs Insights). This is especially important in event driven application, when you cannot debug whole workflow within one trace id.
You should also consider create alarms for metrics, which are important to you. For example creating CloudWatch Anomaly Detection for lambda invocations sounds interesting.
Do not forget about creating charts, diagrams and others statistics. In CloudWatch you can create dashboards, but think about third party solutions. There are a lot of services, which work well with AWS and make your debugging, tracking performance much easier — eg. New Relic, Datadog or Sumo Logic.
Do not overcomplicate your stack
Your services can grow quickly. You might think to put all functions / resources definitions into one CloudFormation stack (eg. in single serverless.yaml /Serverless framework). As this sounds good for few functions (which together delivers single feature), in future it could make you some troubles.
For a long time CloudFormation had limit of 200 resources in stack and because one function needs couple of resources (Lambda, versions, roles etc.), there was high risk, that you will hit that limit quickly. Especially if you keep common resources (such as S3, DynamoDB, SNS, SQS) in the same stack. Now that limit has been increased to 500, but still —you should avoid keeping so many resources in one stack.
If you hit that limit, moving resources from one stack to another can be painful — you will need to guarantee, that during operation all yours resources will be reliable. AWS supports importing resources from one stack to another, but it’s not easy to set, especially if you use framework, such as Serverless, to generate and deploy CloudFormation stack.
In summary:
- think about resources limit at the beginning, to avoid refactoring problems;
- splitting architecture into multiple stacks might improve you build / deploy process;
- keep you global / common resources (database, queues, topics) out of stacks, which you often deploy;
- if you really need to keep all resources in single CoudFormation stack, think about nested stacks.
Choose the best framework for you
If you are new in Serverless and want to create something simple, think about AWS Amplify.
For larger project it’s worth to try Serverless framework, which improve building and deploying Lambda services (and surrounding resources). It has large community, and a lot of plugins (eg. Webpack or Offline).
Serverless framework supports different providers, but you might think about something dedicated only for AWS — try Serverless Application Model.
You should also consider Terraform, which is great framework for managing persistent shared infrastructure. It’s common practice to use Terraform for cloud infrastructure and SAM or Serverless for deploying applications.
Listen podcasts, read newsletters, prepare for certificate
My favorite resource is Off-by-none newsletter + podcasts: AWS Podcast, Serverless Chats, Real World Serverless.
Very useful informations can be found on AWS YouTube channel, AWS Solutions Reference Architectures and cdkpatterns.com.
Some of Twitter accounts, which are worth to follow: @jeremy_daly, @sheenbrisals, @danilop, @dabit3, @theburningmonk, @jbesw, @houlihan_rick, @jeffbarr, @alexbdebrie, @awsgeek.
For beginner great resource is serverless-stack.com.
If you would like to be AWS architect, you can prepare for certificate. I can recommend Ultimate AWS Certified Solutions Architect Associate course.
Exploring serverless world has opened my eyes on problems / areas, which I still need to learn. This was just small list of tips, but hope you found something useful.