Data scraping
Headless browser farms: Browserbase, Steel, or self-hosted
Three headless browser layers, one Den Bosch broker, 9,400 weekly scrapes. Per-session cost, captcha survival, and who rotates IPs when a /24 dies.

The quote agent fires at 06:00 on a Tuesday. A 22-person insurance broker in Den Bosch needs roughly 1,300 quotes scraped before the office opens through a headless browser farm, across Independer and Pricewise, plus a private underwriter portal that lives behind login. The week before, that portal blocked their entire Hetzner /24 for six hours after a careless overnight run. The team called us at 09:40 the same morning.
We had built the agent on a self-hosted Playwright pool. It worked. Until it didn't. Three weeks later we had benchmarked all three obvious options on the same workload, paying for them out of our own pocket so the broker would not eat the cost of a bad decision. This is what we found.
The workload, in numbers
About 9,400 page fetches per week. Of those, roughly 7,800 are Independer and Pricewise listing pages (heavy JS, no captcha most days), and 1,600 are the underwriter portal (login flow, occasional reCAPTCHA v2, session-bound cookies). Average page load 4.2 seconds. Average session length 38 seconds when things go well, closer to two minutes when a captcha or device fingerprint check kicks in.
Run sequentially, that is roughly 100 hours of browser time per week. We parallelise across 12 to 18 workers so the morning batch finishes inside 45 minutes. Then a slower trickle through the day to catch quote updates.
Browserbase
Browserbase is a managed headless browser cloud. You connect over CDP, they hand you a Chromium session, you drive it with Playwright like normal. Their docs are honest about what is and isn't included.
The pitch that mattered: built-in proxies (datacenter or residential), built-in stealth, captcha solving on the paid tiers, and a full session replay for every run so when something breaks at 03:00 we can scrub the DOM. The hidden cost: per-session pricing adds up faster than the calculator suggests once you stack stealth, residential proxy, and captcha solving on top of the base browser-minute rate.
On our test week, end to end, we paid roughly €0.071 per session for the portal pages (stealth + residential + occasional captcha) and €0.018 per session for the listing pages. Weekly total: about €145. Captcha survival rate over the 1,600 portal pages: 96.4%.
One quiet win: when the underwriter portal banned our pool, Browserbase rotated the residential allocation automatically. We did nothing. Their support confirmed within 20 minutes that they had seen the block and shifted ASN.
Steel.dev
Steel.dev is a newer open-source browser API. You can self-host the OSS core (Apache 2.0) or use their managed cloud. We tried both.
The OSS core is good. It exposes a clean session API on top of Chromium, with hooks for fingerprinting and proxy injection. The managed cloud is cheaper per browser-hour than Browserbase, but the captcha and residential-proxy story is bring-your-own. We wired in Bright Data residential exit nodes and 2Captcha for solving.
Same workload: €0.043 per portal session, €0.012 per listing page. Weekly total: about €98. Captcha survival: 91.7%.
What we paid less for in Steel.dev, we paid for in glue. Every captcha failure was ours to handle. Every proxy rotation policy was ours to write. When the underwriter blocked us mid-batch (one /24 of the Bright Data pool went hot for an hour), the rotation logic in our code was what saved the run. Browserbase had hidden that work from us. Here we owned it.
Self-hosted Playwright on a Chromium farm
The thing we already had. Six Hetzner CCX23 nodes, Docker, Playwright with chromium-headless-shell, a Redis queue, and a custom session manager. Pure compute came to €68 for the week. Plus Bright Data residential at roughly €11.20/GB ran another €74. Plus 2Captcha: €31. Weekly total: about €173.
Wait. The DIY option is more expensive? Yes, because the Hetzner nodes are sized for the 06:00 peak and idle 70% of the day. The managed services bill per session and effectively absorb the off-peak silence. If you can't bin-pack, you pay for empty seats.
// what we run on the self-hosted farm
import { chromium } from 'playwright'
import { rotateExitNode } from './brightdata.js'
async function fetchQuote(policyId) {
const exit = await rotateExitNode({ session: policyId, sticky: '00:10:00' })
const browser = await chromium.connectOverCDP(`ws://farm:9222`)
const ctx = await browser.newContext({
proxy: { server: exit.url, username: exit.user, password: exit.pass },
viewport: { width: 1366, height: 768 },
locale: 'nl-NL',
})
const page = await ctx.newPage()
await page.route('**/*.{png,jpg,jpeg,webp}', r => r.abort()) // skip images, save GB
await page.goto(`https://portal.example.nl/quote/${policyId}`)
// ... extract, post to queue
await ctx.close()
}
Captcha survival on the same workload: 88.9%. Lower because our own fingerprint masking is half a generation behind what Browserbase ships. We hadn't yet implemented the small audio-context randomisation tricks that move reCAPTCHA v3 scores in our favour.
The /24 incident that started this exercise cost us four engineering hours of hand-rotation and Bright Data zone swaps. On Browserbase that was zero. On Steel.dev managed it was about forty minutes.
Who carries the pager
This is the question the spreadsheet doesn't answer. When the underwriter's WAF tightens at 02:00 on a Sunday and the morning batch is going to miss the 06:00 deadline, who notices, who escalates, and how fast does the IP pool change hands?
Per-session cost is the loudest number on the page. Pager ownership is the one that wakes you up. Price the second before you price the first.
For a 22-person broker with no on-call engineer, Browserbase's hidden value was that it had silently absorbed two incidents we did not know had happened. On Steel.dev managed, we caught both. On the self-hosted farm, we caught one and the other took the morning batch out for 90 minutes.
What we picked
Browserbase, for now. Not because it was the cheapest (it wasn't). Because the broker's ops manager does not want to know what a /24 is, and the cost of one missed morning batch (in client trust, in the WhatsApp she gets from the senior advisor at 06:47) is higher than the €47 we save per week going to Steel.dev managed.
We will revisit the self-hosted option once the agent load grows past 25,000 sessions per week. At that volume, the bin-packing math flips, and the engineering time to own the stack starts paying for itself. We will likely keep Steel.dev's OSS core as the warm fallback if Browserbase ever has an outage during the 06:00 sprint.
Notes on the residential pool problem
One thing none of the three solves cleanly: when a Dutch portal decides to block a residential ASN, the pool you are renting from has to notice and shift. Bright Data and Oxylabs both rotate, but the speed depends on zone configuration. We learned to set the rotation interval to per-request for the underwriter, and sticky 10 minutes for the listing pages. The listing scrapers benefit from cookie continuity. The portal does not.
Residential proxies are billed by GB. A heavy JS page with images loaded is 3 to 6 MB. The 9,400-page week ran us about 32 GB, with image loading disabled where possible. Disable images at the route level. We forgot, the first week, and paid €112 extra in proxy traffic for the privilege.
One more note on money. Headless agents that drive browsers with a credit card on file can burn through a budget faster than anything else in your stack. The HN front page has carried more than one story about an agent quietly bankrupting its operator over a weekend. Set hard weekly caps on every vendor dashboard before you ship anything that talks to a billing API. Browserbase, Steel.dev managed, and Bright Data all expose them. Use them on day one.
A five-minute audit you can do today
If you run a scraping agent and you haven't measured it, run this once: count sessions per week, multiply by 60 seconds (an honest average), and price all three options against that number using their public calculators. Then add up the engineering hours your team spent last quarter on browser-fleet incidents. The second number is usually larger than anyone expects, and it is the one that decides which column you live in.
When we built the policy-quote AI agent for the broker, the part we underestimated was the residential pool rotation policy. We solved it by letting Browserbase own that layer for now and writing a Steel.dev fallback we can flip to inside 15 minutes. If you are scoping similar work, the first question is not which vendor. It is who carries the pager.
Key takeaway
Per-session price is the loudest number on the page. Who rotates the IP pool when a /24 dies at 02:00 is the one that decides what you pay for.
FAQ
How much does each option cost for 9,400 weekly sessions?
On our test workload, Browserbase came in at about €145/week, Steel.dev managed (with Bright Data and 2Captcha) at €98, and a self-hosted Playwright farm on Hetzner at €173 once proxy and captcha costs were added.
Which option had the best captcha survival rate?
Browserbase, at 96.4% over 1,600 portal sessions. Steel.dev managed reached 91.7% with the same proxy and solver. Our self-hosted Playwright farm landed at 88.9% before we shipped audio-context fingerprint patches.
When does self-hosting actually pay off?
Once your weekly browser load is high enough that your fixed nodes stop sitting idle. For us, the break-even is around 25,000 sessions per week. Below that, per-session managed pricing wins because it absorbs your off-peak idle.
Who rotates the residential IP pool when a portal blocks a /24?
On Browserbase, the vendor does it automatically. On Steel.dev managed, you bring your own proxy provider and write the rotation policy. On a self-hosted farm, your engineering team owns the entire rotation and incident response.
Should I disable images when scraping with residential proxies?
Yes. Residential proxies are billed by GB, and a single image-heavy JS page can cost 3 to 6 MB of egress. Route-level image blocking cut our weekly proxy bill by roughly €112.