← Blog

Data scraping

Browserbase vs Browserless vs Playwright grid: per-1k cost

A Dutch retailer needs 180,000 competitor price loads a day. Three headless options. The marketing pages give you everything except the bill at the end of the month.

Jacob Molkenboer· Founder · A Brand New Company· 3 Nov 2024· 8 min
Three brass apothecary weights in a row, green price tag on twine, curled cream receipt, ivory paper surface, side light.

You run pricing for a Dutch electronics retailer. Margin is down twelve basis points this quarter, and the board wants daily visibility on the eighteen competitors that matter. Your engineer says she can do it: a headless browser, Playwright, parsed JSON-LD where it exists and a hand-written extractor where it doesn't. About 180,000 page-loads a day, spread between 03:00 and 07:00 Amsterdam time so the dashboard is fresh when the buyers walk in.

She asks which browser farm to put under it. Browserbase? Browserless? A grid on our own Hetzner boxes? You forward the three pricing pages to procurement. Procurement comes back two hours later with one question. What does it actually cost per thousand pages?

The marketing pages never say.

The shape of the pipeline

Eighteen competitors. Roughly 10,000 SKUs each. Most pages load in three to five seconds with full JavaScript. About 30% sit behind aggressive bot protection (Cloudflare Turnstile, DataDome, the usual). About 8% throw a CAPTCHA on the second visit from the same IP.

That gives you four cost levers, and only one of them is the browser:

  • Browser-time (the lever the vendors price on)
  • Bandwidth (CAPTCHAs cost a real fraction of a cent each)
  • Residential or mobile proxies (the line item that quietly dwarfs the rest)
  • Failure rate (every retry is two page-loads for the price of one)

We have built this pipeline three times over the years. The browser bill is almost never the biggest line. But it is the easiest one to get wrong by a factor of five.

Browserbase, the managed lane

Browserbase sells browser-hours. You connect over CDP, they spin up a fresh Chromium with fingerprint randomisation, residential proxy optional, video recording optional, CAPTCHA passthrough optional. At time of writing their public pricing lists three plans: a Developer tier at $39 a month for 200 browser-hours, a Startup tier at $99 for 500, and a Scale tier they negotiate.

The number procurement actually needs sits in the small print: overage is $0.20 per browser-hour on the public plans.

So you sit down with the arithmetic. At three seconds active per page, 180,000 pages a day is 150 browser-hours a day, or 4,500 a month. Subtract the 500 included on the Startup plan, that is 4,000 hours of overage.

Browserbase headline (3s/page):
  $99 (Startup) + 4,000 x $0.20  = $899 / month
  $899 / 5,400 (k pages)         = $0.166 per 1k pages

That's the floor. The first time DataDome holds your session open for thirty seconds on a particularly nasty competitor, your three-second page-load becomes a seven-second one. Recalculate.

Realistic (7s/page on hardened targets):
  350 hours/day = 10,500 / month
  $99 + 10,000 x $0.20            = $2,099 / month
  $2,099 / 5,400                  = $0.389 per 1k pages

Same pipeline. Different competitor mix. Two and a half times the bill. The vendor's pricing page is honest, but it asks you to know your own page-time distribution before it tells you anything.

Browserless, priced by the unit

Browserless v2 sells units. A unit is roughly thirty seconds of browser-time or one short session, whichever ends first. Their public Scale plan lists around $200 a month for 3 million units, with per-unit overage beyond that.

The unit model is forgiving for short sessions and punishing for long ones. For 180k pages a day:

Browserless (1 page ~= 1 unit, single session reused):
  5.4M units/month, 3M included, 2.4M overage at ~$0.0001
  $200 + $240                     = $440 / month
  $440 / 5,400                    = $0.081 per 1k pages

Browserless realistic (sessions rotated every ~10 pages
to duck fingerprint blocks, +30% units):
  $440 + ~$240 of extra units     = $680 / month
  $680 / 5,400                    = $0.126 per 1k pages

The catch is that reuse in their pricing model means staying inside one session. The moment you tear a session down for fingerprint reasons (and you will, often, because retailers will block the second pageview from the same browser) the unit count climbs. There is a billing breakdown in the Browserless docs that walks through how units convert from sessions. Read it before you sign. Then add 30% in your spreadsheet.

Self-hosted Playwright grid

This is where the marketing pages stop helping at all, because nobody is selling it to you.

A grid is three things: a Playwright install, a pool of worker processes, and a queue in front. We usually run it on three Hetzner CCX33 nodes (8 dedicated vCPU, 32 GB RAM) at roughly €60 a month each. Each node holds twelve concurrent Chromium contexts comfortably; we cap at ten for headroom.

// One worker. Real, runnable.
import { chromium } from 'playwright'

const browser = await chromium.launch({
  args: ['--disable-blink-features=AutomationControlled'],
})

const ctx = await browser.newContext({
  locale: 'nl-NL',
  timezoneId: 'Europe/Amsterdam',
  userAgent: process.env.UA,
  viewport: { width: 1366, height: 768 },
  proxy: { server: process.env.PROXY_URL! },
})

const page = await ctx.newPage()
await page.goto(process.env.URL!, { waitUntil: 'domcontentloaded' })
const price = await page.evaluate(() => {
  const el = document.querySelector('[itemprop="price"]')
  return el?.getAttribute('content') ?? el?.textContent?.trim()
})

await ctx.close()
await browser.close()
console.log(JSON.stringify({ url: process.env.URL, price }))

Capacity check:

30 concurrent contexts across 3 nodes
4s average page-time (3s active + 1s overhead)
30 / 4   = 7.5 pages/second sustained
7.5 x 86,400  = 648,000 pages/day theoretical max
180,000 actual = 28% utilisation, headroom for retries and bursts

Per-1k pages, two ways to read it:

3 x €60 hardware                  = €180 / month
Grafana Cloud free tier             = €0
Engineer babysitting (~2h/week)     = ~€800 / month at our rate
-----------------------------------
Loaded steady state                 = €980 / month
€980 / 5,400                       = €0.181 per 1k pages

Without the engineer line item       = €0.033 per 1k pages

The honest number depends on whether you already employ someone who can babysit a grid, or whether you have to hire to.

Warning

None of these numbers include residential proxies. Realistic proxy spend for 180,000 Dutch e-commerce pages a day is €600 to €1,400 a month depending on provider and how aggressive the targets are. That is the line item your CFO will ask about, and none of the three browser farms include it.

Per-1k pages, side by side

Pipeline: 180,000 Dutch e-commerce price pages/day
Average page-time 4s. Bot-protection share ~30%.

                         Cost / month    Per 1k pages
Browserbase Startup      $   899         $0.166
Browserbase realistic    $ 1,400         $0.259
Browserless Scale        $   440         $0.081
Browserless realistic    $   680         $0.126
Self-hosted (raw infra)  €   180         €0.033
Self-hosted (loaded)     €   980         €0.181

Three honest reads of the same table:

  1. If you have nobody in-house, Browserless is the cheapest credible option for this shape of traffic. The unit model rewards short scrapes.
  2. If your engineer is already there and bored, the grid is half-price even loaded, and you keep the operational knowledge.
  3. Browserbase is what you buy when the differentiator is the fingerprint stack and the CAPTCHA passthrough, not the browser-time. For the 30% of pages behind real bot protection, it earns the premium. For the other 70%, you are paying for capability you do not use.

Where each one actually wins

Browserbase wins on hard targets. If your scraper spends more time fighting Cloudflare Turnstile and DataDome than it does loading product pages, the managed fingerprint surface is worth the markup. We put their grid behind a router that only sends the hard targets there, never the easy ones.

Browserless wins on shape. Short, predictable sessions, lots of them, low bot-protection share. Marketplaces with public APIs you are augmenting. Internal QA pipelines. Anything where one page equals one unit and you never have to rotate.

The self-hosted grid wins on scale and on knowledge. Past a million pages a day, both managed options leave you with a five-figure monthly bill and no room to renegotiate it. The grid also keeps the failure modes inside your team, which is a feature when something breaks at 04:30 on a Tuesday and the on-call has to know more than which Slack channel to ping.

What we ended up doing

When we built the price-monitoring pipeline for a Dutch fashion retailer last year, we routed the 65% of traffic that was unprotected through a four-node Playwright grid, and bought a small Browserbase bucket for the protected hosts. A simple URL-hash router decided, per request, which lane to use. The blended cost landed at €0.11 per 1k pages including proxies, monitoring, and our retainer. The same hybrid shape is how we build any data-pipeline work at that volume: managed where it earns the markup, self-hosted where it doesn't.

The five-minute audit

Open your scraper logs. Pull the median page-time. Multiply by your daily page count and divide by 3,600 to get browser-hours per day. That number times $0.20 is your Browserbase floor. That number times 30 is your monthly hours. Two minutes of arithmetic tells you whether the vendor you are about to sign with is priced for your traffic shape, or someone else's.

Key takeaway

At 180k pages a day, blend a self-hosted Playwright grid with a small managed bucket for the hardened targets. Around €0.11 per 1k pages, proxies included.

FAQ

Why don't the vendors publish per-1k-pages pricing?

Because their cost model is browser-time or session-units, not pages. Two scrapers hitting the same URL list can produce wildly different bills depending on average page-time and how often sessions rotate.

Do these numbers include proxies?

No. Residential proxy spend for 180,000 Dutch e-commerce pages a day runs roughly 600 to 1,400 euros a month and sits outside all three browser-farm bills.

When does self-hosting stop making sense?

When more than half your traffic sits behind hardened bot protection, or when nobody on the team is comfortable babysitting Chromium processes. At that point you are paying for fingerprint engineering, not browser-time.

Can I mix Browserbase and a self-hosted grid?

Yes, and we recommend it past about 100k pages a day. Route hard targets through the managed lane and everything else through your own grid. A URL-hash router is enough.

What about Puppeteer or Selenium grid?

Same maths applies. The cost dimensions are hardware, engineer time, and proxies. Playwright wins on context isolation and tooling, but Selenium Grid will land within 10% on cost for the same workload.

data scrapingautomationarchitecturetoolingoperations

Building something?

Start a project