Cheerio vs Puppeteer: which to use for scraping
- Authors
- Name
- Hamza Rahman
- Published on
- -3 mins read
Both Cheerio and Puppeteer show up when you scrape with Node.js, but they solve different problems. The short version: Cheerio parses HTML, Puppeteer runs a browser. Picking the wrong one is the most common reason a scraper returns empty data.
Cheerio: a fast HTML parser
Cheerio takes an HTML string and gives you a jQuery-style API to query it. It does not open a browser and it does not run JavaScript. You fetch the HTML yourself, usually with Axios or fetch, and hand it to Cheerio.
const axios = require('axios')const cheerio = require('cheerio')
const { data: html } = await axios.get('https://example.com')const $ = cheerio.load(html)const title = $('h1').text()Because there is no browser, Cheerio is fast and uses very little memory. The catch: it only sees the HTML the server sends. If the page builds its content in the browser with React, Vue, or Angular, that content is not in the initial HTML, so Cheerio cannot find it.
Puppeteer: a real headless browser
Puppeteer launches a headless Chrome and controls it from code. It runs the page's JavaScript, so you get the same DOM a user would see after the page loads. You can also click buttons, fill forms, scroll, wait for elements, and take screenshots.
const puppeteer = require('puppeteer')
const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://example.com', { waitUntil: 'networkidle0' })const title = await page.$eval('h1', (el) => el.textContent)await browser.close()That power costs you speed and memory. Starting a browser is slow compared with a plain HTTP request, and running many pages at once is heavy.
Which one to use
| Situation | Use |
|---|---|
| Server-rendered or static HTML | Cheerio |
| Content rendered by JavaScript in the browser | Puppeteer |
| You need to click, log in, scroll, or wait | Puppeteer |
| Speed and low memory matter, many pages | Cheerio |
| Screenshots or PDFs | Puppeteer |
A quick test: open the page, view source (the raw HTML, not the inspector), and search for the data you want. If it is there, Cheerio is enough. If it only appears in the live DOM, you need Puppeteer.
Using both together
You can combine them. Let Puppeteer load the page and run its JavaScript, grab the rendered HTML, then parse it with Cheerio's lighter API.
const html = await page.content()const $ = cheerio.load(html)This is handy when a page needs a browser to render but you prefer Cheerio's selector syntax for the actual extraction.
Related
- Web scraping with Node, Axios and Cheerio: a full Cheerio tutorial with selectors, filtering, pagination, and CSV output.
- Cheerio vs node-html-parser: when you want something even lighter than Cheerio.
Related articles
Cheerio vs node-html-parser
Cheerio vs node-html-parser for parsing HTML in Node.js: node-html-parser is lighter and faster, Cheerio has a fuller jQuery-style API. How to choose between them.
Web scraping with Node, Axios and Cheerio
Web scraping tutorial with Node.js using Axios and Cheerio. Fetch a page, select data with Cheerio CSS selectors, filter it, handle pagination, and save the results to a CSV file.
Building a RAG System with MongoDB and Node.js
Build a RAG system in Node.js using MongoDB text search. A good fit when you already run MongoDB and need keyword retrieval without a separate vector database.

