May 15, 2024

Multi-Page PDF with Distinct Layout Using Puppeteer

The best time to establish protocols with your clients is when you onboard them.

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

These days, practically every company seeks to offer data in PDF format, whether it your bank statement or order details. All people do is share information through PDFs, which you can view on your devices and print to keep on file. Given the widespread use of PDFs, all developers ought to experiment with PDF producing libraries, including pdfmake, PDFKit, Puppeteer, and so on.

In this tutorial, I will generate the PDF locally using Puppeteer-Core subsequently, we will talk about an entire architecture to automate PDF generation with AWS services. As you may have observed by now, I’ve talked about Puppeteer-Core rather than Puppeteer, so let’s start by examining their distinctions.

‍

What is Puppeteer and how its different from Puppeteer-Core ?

‍

Puppeteer is a Node.js library that provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default but can be configured to run in full (“headful”) Chrome/Chromium. You’re on the correct road if, after reading this definition, you’re thinking it’s a browser control library akin to web automation. It will produce the PDF exactly how you would manually with an HTML page in your browser.

‍

‍

Initial Setup and Requirement

Linux OS (Ubuntu)
Node.Js (18.17.0)
Puppeteer-Core (21.5)
@sparticuz/chromium (119.0.2)

‍

After successfully installing these requirements, your package.json will have these entries.

"dependencies": {

"@sparticuz/chromium": "^119.0.2","puppeteer-core": "21.5"

}

‍

We can now get on with the coding as our initial setup is over.

‍

Though you can have your own HTML file with content, I’m using test.html here. I’m also handling files with the fs module.

‍

This is the PDF I produced, and each page has a different layout. You can write your page content inside the `div` tag and utilize the CSS `page-break` property to ensure the page break.

<div style="page-break-after: always;"></div>

‍

Now that we have finished the static content portion of the PDF generating process, genuine PDFs will require dynamic content that varies periodically for various users.

Regular expressions, or RegExp, are one potential way to solve this issue in our HTML template. Then, we can utilize RegExp and the `replace` method in our JavaScript file to swap out certain strings for dynamic values.

This code is a component of the HTML code, where the RegExp is placeholderd with `${Receipt Number}.

<td>

<strong>Receipt Number : </strong><span>${Receipt Number}</span>

</td>

‍

The JavaScript `replace` method will be used to replace this RegExp with dynamic data.

let pdfData = {'Receipt Number' : "10/04/2024"}let htmlContent = htmlString.replace(/\${([^}]+)}/g, (match, key) => pdfData[key.trim()]);

‍

After performing this replace operation, `${Receipt Number}` will be replaced with `10/04/2024` in the HTML code, which will later be passed to Puppeteer for PDF generation. Now, let’s discuss the automation of PDF generation.

‍

How to Automate PDF Generation Using AWS Services

Automation of PDF generation requires some AWS services such as:

‍

We associate this Lambda with every event that takes place in our application, such as database insertion. We can now be guaranteed that this Lambda will be executed on a certain event after adding it as a trigger. The S3 bucket is the following section. The HTML template, which will be sent to the browser to create the PDF, will be stored in an S3 bucket.

The processes involved in creating a PDF can be seen by looking at the Pdf Automation Design materials.

For presentation purposes, I am using AWS Lambda, which gets triggered on DynamoDB operations, and S3 buckets to fetch the template and store the newly created PDF. Let’s look into how we set up this whole architecture in our project:

Make HTML Template and Save it to Template-bucket on S3.
Make a Lambda Method that mostly does four things.
a. Retrieve the template out from the template-bucket S3 bucket.
b. Apply the replace function to replace event data obtained in the Lambda method’s parameter for the RegExp string found in our template.
c. Create the PDF with altered values by using Puppeteer with the template.
d. Put the produced PDF into an other S3 bucket (pdf-bucket).
On DynamoDB Operations, Trigger Lambda Add a Lambda function as a trigger for every operation on our DynamoDB table.

‍

We have come to the end of this article. I hope you learned something new, just like I did when I had a requirement to generate a PDF with multiple pages and different layouts. Discovering Puppeteer fulfilled my requirement, and I’m glad to share this knowledge with you. Thank you for your time. If you have any suggestions, please feel free to provide them in the comments section.

CodeStax.Ai

Profile

May 14, 2024

min read

Subscribe to our newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Share this article:

Multi-Page PDF with Distinct Layout Using Puppeteer

Heading

Initial Setup and Requirement

"dependencies": {

"@sparticuz/chromium": "^119.0.2","puppeteer-core": "21.5"

}

‍

<div style="page-break-after: always;"></div>

<td>

<strong>Receipt Number : </strong><span>${Receipt Number}</span>

</td>

let pdfData = {'Receipt Number' : "10/04/2024"}let htmlContent = htmlString.replace(/\${([^}]+)}/g, (match, key) => pdfData[key.trim()]);

‍

How to Automate PDF Generation Using AWS Services

More articles

CodeStax.Ai

Serverless Architectures: Beyond Lambda

Serverless architectures specify a change in our process to produce and execute applications.

CodeStax.Ai

AWS Neptune Demystified: Your Guide to Graph Databases and Gremlin Queries

The knowledge on graph databases is crucial as we live

CodeStax.Ai

Introduction to AWS SAM CLI: Simplify Serverless Development

The Serverless architecture in cloud computing helps developers

CodeStax.Ai

Automating AWS Lambda Version Cleanup with Node.js and AWS SDK

In the realm of serverless computing, AWS Lambda functions

CodeStax.Ai

AWS CodeCommit — Version control for beginners

Nowadays, software development is a field where speed is crucial.

CodeStax.Ai

How to deploy Bun.js in AWS Lambda?

JavaScript is one of the most popular and widely used

CodeStax.Ai

Amazon CodeWhisperer: AI-Powered Suggestions and Security Boost

Amazon CodeWhisperer utilizes machine learning

CodeStax.Ai

Elements on a web page can be located using XML expressions with Selenium’s XPath locator.

S3 is excellent for storing files

CodeStax.Ai

AWS — Log Anomaly Detection and Recommendations

Developers can now more effectively monitor and troubleshoot their applications

CodeStax.Ai

AWS Fargate and AWS Lambda which one to choose for your project?

AWS Fargate and AWS Lambda

CodeStax.Ai

Advanced Queries For AWS Timestream

Window functions in Timestream give you extensive analytical capabilities

CodeStax.Ai

AWS Lambda Foundations

There are three patterns to invoke a Lambda function, called Invocation models. The invocation model to be used depends on the event source

CodeStax.Ai

Automating Reconciliation Using AWS Glue

AWS Glue is a fully managed ETL service that makes it easy to move data

CodeStax.Ai

AWS Lambda with SQS — Setup SQS Trigger to Lambda

AWS Lambda is an event-driven, server-less computing platform provided by Amazon.

CodeStax.Ai

Storing Secure Configuration Data with AWS Parameter Store: A Step-by-Step Tutorial

Amazon Web Services (AWS) Parameter Store is a service that enables you to

CodeStax.Ai

AWS Timestream — Introduction

AWS Timestream is comparable to Graphite and Influx.

CodeStax.Ai

Getting Started With AWS Fargate

Deploying the application to the web is a burden and maintaining the server is also another big task for the DevOps engineers.

CodeStax.Ai

Managing users with AWS Cognito

Cognito is known for authentication, authorization and user management for mobile and web applications

CodeStax.Ai

Streaming QLDB Journal data to Lambda

In this article we’ll discuss how to stream QLDB (Quantum Ledger Database)

CodeStax.Ai

Creating an Automated Deployment Pipeline - CodeCommit to Lambda

“Merge conflict” is one of the worst messages a developer can see in Git.

CodeStax.Ai

Encryption is a way of scrambling data so that only authorized parties can understand the information.

Quantum Ledger Database (QLDB) is a No-SQL (Semi-SQL & Semi-NoSQL)

CodeStax.Ai

Speed up your lambda functions