Cheerio vs Puppeteer: which to use for scraping

Both Cheerio and Puppeteer show up when you scrape with Node.js, but they solve different problems. The short version: Cheerio parses HTML, Puppeteer runs a browser. Picking the wrong one is the most common reason a scraper returns empty data.

Cheerio: a fast HTML parser

Cheerio takes an HTML string and gives you a jQuery-style API to query it. It does not open a browser and it does not run JavaScript. You fetch the HTML yourself, usually with Axios or fetch, and hand it to Cheerio.

const axios = require('axios')
const cheerio = require('cheerio')

const { data: html } = await axios.get('https://example.com')
const $ = cheerio.load(html)
const title = $('h1').text()

Because there is no browser, Cheerio is fast and uses very little memory. The catch: it only sees the HTML the server sends. If the page builds its content in the browser with React, Vue, or Angular, that content is not in the initial HTML, so Cheerio cannot find it.

Puppeteer: a real headless browser

Puppeteer launches a headless Chrome and controls it from code. It runs the page's JavaScript, so you get the same DOM a user would see after the page loads. You can also click buttons, fill forms, scroll, wait for elements, and take screenshots.

const puppeteer = require('puppeteer')

const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://example.com', { waitUntil: 'networkidle0' })
const title = await page.$eval('h1', (el) => el.textContent)
await browser.close()

That power costs you speed and memory. Starting a browser is slow compared with a plain HTTP request, and running many pages at once is heavy.

Which one to use

Situation	Use
Server-rendered or static HTML	Cheerio
Content rendered by JavaScript in the browser	Puppeteer
You need to click, log in, scroll, or wait	Puppeteer
Speed and low memory matter, many pages	Cheerio
Screenshots or PDFs	Puppeteer

A quick test: open the page, view source (the raw HTML, not the inspector), and search for the data you want. If it is there, Cheerio is enough. If it only appears in the live DOM, you need Puppeteer.

Using both together

You can combine them. Let Puppeteer load the page and run its JavaScript, grab the rendered HTML, then parse it with Cheerio's lighter API.

const html = await page.content()
const $ = cheerio.load(html)

This is handy when a page needs a browser to render but you prefer Cheerio's selector syntax for the actual extraction.

Web scraping with Node, Axios and Cheerio: a full Cheerio tutorial with selectors, filtering, pagination, and CSV output.
Cheerio vs node-html-parser: when you want something even lighter than Cheerio.

Cheerio vs Puppeteer: which to use for scraping

Cheerio: a fast HTML parser

Puppeteer: a real headless browser

Which one to use

Using both together

Related articles

Cheerio vs node-html-parser

Web scraping with Node, Axios and Cheerio

Building a RAG System with MongoDB and Node.js