Running Puppeteer and Headless Chrome in a Docker container is a great way to get started with these tools. Puppeteer lets you control a web page or application from your computer, while Headless Chrome lets you run Chrome without any user interface. To run Puppeteer and Headless Chrome in a Docker container, first install Docker. Then, create a new container using the following command: docker create –name puppeteer-container -p 8888:8888 puppeteer/puppeteer-container Next, add the Puppeteer and Headless Chrome images to the container using the following commands: docker add puppeteer/puppeteer-container docker add chromium/chrome-headless:latest ..
Puppeteer is a Node.js library which lets you interact with the Chrome web browser. Recent releases also include Firefox support.
Puppeteer is commonly used to automate testing, archive webpage data, and generate screenshots of live web content. It lets you control Chrome via a clear API, giving you the ability to navigate to pages, click on form controls, and issue browser commands.
Getting Puppeteer running in a Docker container can be complex as many dependencies are needed to run headless Chrome. Here’s how to get everything installed so you can use Puppeteer in a Kubernetes cluster, in an isolated container on your dev machine, or as part of a CI pipeline.
The Basic Requirements
We’re using a Debian-based image for the purposes of this article. If you’re using a different base, you’ll need to adapt the displayed package manager commands accordingly. The official Node.js image is a suitable starting point that means you don’t need to manually install Node.
Puppeteer is distributed via npm, the Node.js package manager. It bundles the latest build of Chromium within its package, so theoretically an npm install puppeteer would get you running. In practice, a clean Docker environment will lack the dependencies you need to run Chrome.
As it’s ordinarily a heavyweight GUI program, Chrome depends on font, graphics, configuration, and window management libraries. These all need to be installed within your Dockerfile.
At the time of writing, the current dependency list looks like this:
The dependencies are being installed manually to facilitate use of the Chromium binary that’s bundled with Puppeteer. This ensures consistency between Puppeteer releases and avoids the possibilities of a new Chrome release arriving with incompatibilities that break Puppeteer.
Now run npm install puppeteer in your local working directory. This will create a package.json and package-lock.json for you to use. In your Dockerfile, copy these files into the container and use npm ci to install Puppeteer.
The final step is to make Puppeteer’s bundled Chromium binary properly executable. Otherwise, you’ll run into permission errors whenever Puppeteer tries to start Chrome.
You might want to manually install a specific Chrome version in customized environments. Setting the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable before you run npm ci will disable Puppeteer’s own browser download during installation. This helps slim down your final image.
At this point you should be ready to build your image:
This is a fairly large build process which could take several minutes on a slower internet connection.
Using Puppeteer in Docker
Some special considerations apply to launching Chrome when you’re using Puppeteer in a Dockerized environment. Despite installing all the dependencies, the environment still looks different to most regular Chrome installations, so additional launch flags are required.
Here’s a minimal example of using Puppeteer inside your container:
This demonstrates a simple script that launches a headless Chrome instance, navigates to a URL, and captures a screenshot of the page. The browser is then closed to avoid wasting system resources.
The important section is the arguments list that’s passed to Chromium as part of the launch() call:
disable-gpu – The GPU isn’t usually available inside a Docker container, unless you’ve specially configured the host. Setting this flag explicitly instructs Chrome not to try and use GPU-based rendering. no-sandbox and disable-setuid-sandbox – These disable Chrome’s sandboxing, a step which is required when running as the root user (the default in a Docker container). Using these flags could allow malicious web content to escape the browser process and compromise the host. It’s vital you ensure your Docker containers are strongly isolated from your host. If you’re uncomfortable with this, you’ll need to manually configure working Chrome sandboxing, which is a more involved process. disable-dev-shm-usage – This flag is necessary to avoid running into issues with Docker’s default low shared memory space of 64MB. Chrome will write into /tmp instead.
Add your JavaScript to your container with a COPY instruction. You should find Puppeteer executes successfully, provided proper Chrome flags are used.
Conclusion
Running Puppeteer in a Docker container lets you automate webpages as part of your CI pipelines and production infrastructure. It also helps you isolate your environment during development, so you don’t need to install Chrome locally.
Your container needs to have the right dependencies installed. You must also set Chrome launch arguments so the browser operates correctly in your Dockerized environment. Afterwards, you should be able to use the Puppeteer API with no further special considerations.
It is worth paying attention to Chrome’s resource usage. Launching multiple browsers in a single container instance could quickly exhaust Docker memory limits. Either raise the limits on your container or implement a system that restricts script concurrency or reuses running browser instances.