Serverless PDF news renderer in Node.js

7 min readMay 3, 2019

Engineers spend a lot of time debugging and looking for performances bottlenecks. Memory leaks, split brains issues… You need to know, how to write good code but also predict which parts of service will need more power and eventually move them into separate microservice or create serverless function (eg. AWS Lambda).

PDF renderer engine is good case for using AWS Lambda, because it uses a lot of memory in comparison to the rest of my application. I want to share with you implementation of PDF news app which I have created based on AWS: EC2, S3, DynamoDB, Lambda.

Concept

Application renders news in PDF for chosen country and category. As a source of news I use newsapi.org REST endpoint. UI is simple React app and server is created by Express.js. Source code: https://github.com/machnicki/aws-pdf/tree/version/1

Workflow

Before Express API will call newapi.org, first it will look for proper document in S3 . If there is PDF already generated, API will return it instead of create fresh one— simple 1-hour cache mechanism. App creates only one PDF for particular combination of time (date and hour), category and country. ${country}-${category}-${date}.pdf is unique key for document and file name.
Express API calls newapi.org with proper category and country. Response is in JSON format.
After receive news in JSON, Express API generates HTML document based on that JSON data and HTML template (template is kept in DynamoDB). Templating is done via handlebars (https://handlebarsjs.com/). Generated HTML document is saved into DynamoDB.
Express API invokes Lambda function, passing file name (unique key for HTML document) and expecting url for generated PDF as response.
Lambda function gets proper documents HTML from DynamoDB.
Lambda function generates PDF (I use html-pdf node module: https://github.com/marcbachmann/node-html-pdf) and saves it into S3.
As response Lambda function returns url to generated PDF document.

React SPA

For UI I have used create-react-app with react-bootstrap. All frontend code you can find here: https://github.com/machnicki/aws-pdf/tree/version/1/src.

For purposes of this article, important part is, how javascript use HTML form data, send them to API and open generated PDF in new window: https://github.com/machnicki/aws-pdf/blob/version/1/src/App.js#L33-L53

UI sends country and category to API and open new window with received url (PDF)

Express backend

I decided to work with newest ES syntax with experimental ES modules (node version 11). Thats why I need to use — experimental-modules flag to run my nodejs server: https://github.com/machnicki/aws-pdf/blob/version/1/package.json#L20.

I have defined couple simple API endpoint, most important is POST /api which triggers generating PDF based on UI payload: https://github.com/machnicki/aws-pdf/blob/version/1/api/index.mjs#L21.

Other endpoints are not part of this article, they are used to edit template, download list of documents or particular document. You can explore it yourself, enjoy!

Main GET / route is used to serve UI (index.html). Depends on environment, this is webpack dev server or static build.

DynamoDB vs. DocumentDB vs. S3 store

In my application I need to store documents HTML code to consume it by Lambda function. It’s massive string so I cannot just pass it as payload. I have decided to use NoSQL database. DynamoDB was simple enough, but I have considered another NoSQL database from Amazon — DocumentDB, which gives me MongoDB and possibility to save larger data (16MB for document, where MongoDB max size of item is 400KB — for large HTML code it could be not enough). At this moment I stick with DynamoDB.

Documents are kept in simple pair: id (file name) and html (base64 HTML). Template used for generating document is specific document with name “template”.

All code which is run do create DB structure you can find here: https://github.com/machnicki/aws-pdf/blob/version/1/api/db/createTable.mjs

In future I will probably resign from DynamoDB and keep all data directly in S3. S3 will inform Lambda to create new PDF if S3 will receive HTML document. I consider S3 Select or Amazon Athena tools.

EC2, DynamoDB, Lambda via LocalStack

Create HTML document and save in DB

My POST /api endpoint gets country and category as parameters and calls newapi.org for news in JSON format. Based on that data and HTML template (which was crated in previous chapter) I create HTML document (using handlebars node module) and save that document into DynamoDB. Next I call Lambda function, passing file name (id of HTML document in DynamoDB) and expect url of PDF document (PDF is generated and save into S3 by Lambda — explained later).

getPDF is called on POST /api express route

Source code: https://github.com/machnicki/aws-pdf/blob/version/1/api/pdf.mjs and https://github.com/machnicki/aws-pdf/blob/abf5f072f04d5bb680708186fbf237f54d1fb225/api/pdf-generator.mjs (invoking AWS Lambda)

LocalStack

I was looking for solution, where I could write all my serverless code locally before I will deploy it into AWS.

LocalStack gives me local version of S3, DynamoDB and Lambda.

localstack/localstack

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline! …

github.com

To work with all services locally (so with Lambda as well) you need to run LocalStack via docker (as I have described in README: https://github.com/machnicki/aws-pdf/tree/version/1#aws-lambda-pdf-news-generator).

All services are run or different ports and each Lambda function on different containers.

In config file you need to use LOCALSTACK_HOSTNAME env variable instead of localhost to allow those containers talk each other.

Check my config file https://github.com/machnicki/aws-pdf/blob/version/1/api/config.mjs which returns different values depends on environment. For production (AWS) I don’t need to specify endpoint parameters for each service. Additionally for Lambda environment I don’t need so specify accessKeyId and secretAccessKey (Lambda permissions are configured in AWS console).

Lambda

My lambda (https://github.com/machnicki/aws-pdf/blob/version/1/lambda/pdf.mjs) apart of source code needs also something which will convert HTML into PDF. I have decided to use html-pdf node module, which is based on phantomjs.

Lambda function as a parameter retrieve file name, load HTML document from DynamoDB, generate PDF, save it into S3 and returns url to that PDF:

Generator

For most cases Serverless framework is easy and quick way to develop Lambda functions, but in my code I rely only on aws-sdk node module.

This is what I’ve needed:

write functions in Node.js 11 with experimental ES modules enabled,
deploy large phantomjs binary file (for generating PDFs),
depends on environment, deploy function to AWS or to LocalStack.

Webpack bundler

AWS Lambda supports only 6 and 8 versions of Node.js.

I have decided to use Webpack to create one bundle file which will be readable by AWS Lambda.

Function which returns one bundle file as source code (string)

Phantomjs binary

My AWS Lambda creator (https://github.com/machnicki/aws-pdf/blob/version/1/lambda/create.mjs) can create zip file based source file of Lambda function with additional files (like phantomjs binary).

Lambda package size can be maximum 50MB. For larger zip file then 10MB it’s recommended to upload them first to S3 and then point Lambda to that file.

For larger zip files its better to upload them to S3 first

I have discovered that its better to generate zip file by using command line instead of node module node-native-zip, which works fine for simple functions, not necessary for large files, like phantomjs binary (50MB and more).

If you will struggle issues with zip generator in my code, try use another one. Command line is probably good choice.

At the end your zip file which you will deploy to AWS Lambda should include 3 files:

PDFGenerator.js (commonjs bundle — main function)
phantomjs (binary — you need to download linux version to run it in AWS Lambda environment, eg. form https://bitbucket.org/ariya/phantomjs/downloads/)
scripts/pdf_a4_portrait.js (needed by html-pdf node module to generate A4 format)

This is how I generate zip file with all 3 necessary files. You can generate zip file manually from command line.

In README (https://github.com/machnicki/aws-pdf/blob/version/1/README.md) you will find necessary informations, how to run application locally.

Hope you have enjoyed this article. In near future I will try to use other AWS tools and refactor current code:

Make sure that only PDF creator has access to PDF (authentication)
Move rest of the app from EC2 into Lambda
Replace DynamoDB (as a store for html code) by S3 events
Performance tests and improvements