AI agents
Autonomous agent costs: a €2,140 captcha-retry meltdown
A research agent ran for four hours overnight. By morning it had spent €2,140 on residential proxies, all of it retrying the same captcha on Bol.com. Here is what went wrong.

The CFO's phone buzzed at 06:42 on a Thursday with a Bright Data billing alert. By the time the Almere operations lead got to her laptop at 08:10, the meter read €2,140.18. The culprit was a research agent that had been started at 02:30 the previous night by a junior analyst who wanted "a clean export of Bol.com sneaker pricing by 09:00." The agent kept its promise in the most expensive way possible. It hit the same Bol.com search page roughly 4,800 times, each attempt routed through a different residential IP, each blocked by the same captcha, each retry charging the firm a few cents of bandwidth.
This is the story of how that happened, what we found when we read the logs, and the four controls we now put on any agent that talks to a paid network.
The setup
The firm sells programmatic ad inventory to Dutch e-commerce brands. Nineteen people, a small ops team, no dedicated platform engineer. Three months ago the analyst team adopted a popular open-source agent framework to automate competitor price scrapes. The agent had access to a headless browser, a Bright Data residential proxy, and a Postgres database. Its system prompt told it to "keep trying until you have the data."
That last instruction sounds harmless. It is not. We have read the same line written into three different agent stacks this quarter, and it is the single most expensive sentence in modern AI engineering.
The Wednesday-night run started normally. The agent fetched the first Bol.com search results page, parsed it, stored ten rows. On the second page, Bol.com returned a captcha challenge. The agent's vision tool could not solve it. The agent's planner concluded that the page had failed to load. It retried. Different IP, same captcha. It retried again. The proxy rotation kept handing it fresh residential exits, each of which Bol.com flagged within milliseconds.
By 03:00 the loop was running at roughly twenty requests per minute. By 06:00 the residential bandwidth meter had ticked past €1,500. The Bright Data dashboard does have spend alerts, but they were configured against the previous quarter's usage, and the alert email landed in a shared mailbox that nobody monitors overnight.
Why the loop never broke
Three things had to go wrong in sequence.
First, the agent had no concept of cost. It saw the proxy as a tool that either returned a page or did not. A captcha looked, to its parser, like a page that "did not load correctly." The framework's default retry policy was "exponential backoff up to ten attempts," but the framework reset the attempt counter every time the planner re-entered the scrape step. Because the planner kept deciding "we still need the second page of results," the ten-attempt cap effectively never fired.
Second, the proxy is billed per gigabyte of residential traffic, not per request. Bright Data's residential pricing at the time of writing starts around $8.40 per GB on pay-as-you-go and drops with volume commits. Bol.com's search page weighs roughly 1.2 MB once images and the embedded captcha widget load. Multiply by 4,800 attempts and you are looking at about 5.6 GB. Add the agent's screenshot calls and a handful of retries against the product detail pages it had managed to reach earlier in the run, and €2,140 stops being surprising.
Third, no human ever saw the agent run. The team had set up a Slack channel for agent output, but the agent only posted on "task complete" or "task failed." A loop that retries forever sits in a state that is neither.
If your agent only reports on terminal states, an infinite loop is invisible by design. The cost meter is the only thing still moving.
The recurring shape
If this sounds familiar, it is because the same shape has appeared in several agent post-mortems this year. One operator's autonomous reconnaissance agent ran up a four-figure bill probing the DN42 network because refused TCP connections kept being read by the planner as "try a different route." Different domain, same pattern. An agent given a goal, a paid resource, and the instruction to keep going. A target that returns ambiguous responses. A planner that re-enters the same step every cycle. A bill that grows geometrically while the operator sleeps.
The interesting thing about these incidents is that the agents did not malfunction. They did exactly what their prompts told them to do. The failure was in the surrounding system: no spend ceiling, no circuit breaker on repeated identical actions, no human-readable heartbeat.
The current discourse around proactive agents treats "agent does too much" as a UX problem. The Building effective agents essay covers the call-and-response loop clearly enough on the design side. After reading these incident logs back to back, we think the bigger issue is financial controls first and UX second.
What the logs actually showed
We dumped the agent's trajectory into a spreadsheet. Of 4,812 tool calls, 4,798 were to the same fetch_url endpoint with the same URL. The remaining fourteen were screenshot_page calls (used by the vision model to inspect the captcha) and a single write_database call from the first successful scrape, around 02:34.
The planner's reasoning steps, when we read them in order, looked like this:
Step 23: Page did not return the expected product grid. Retrying.
Step 24: Page still missing product grid. Possible network issue.
Retrying with different proxy.
Step 25: Possible bot detection. Retrying with different proxy.
Step 26: Page did not return expected content. Retrying.
[continues for 4,770 more steps]The planner never escalated. It never wrote "I have tried fifty times and the page consistently shows what looks like a captcha. I should stop and ask for help." It had no concept of "I have tried fifty times" because nothing in the loop incremented a counter that survived re-planning.
The four controls we now put on every agent
We do not think this kind of incident is preventable by writing a better prompt. The prompt is too far from the cost. The controls have to live one layer down, between the agent and the paid resource.
A hard budget at the proxy layer
Set a daily and an hourly cap in the proxy provider's dashboard. Bright Data, Oxylabs, Smartproxy and the rest all support this. Pick a number that is two or three times your normal daily usage, not ten times. The cap should trip before the bill hurts. Configure the alert email to a channel a human actually watches, not the analyst's personal inbox.
A circuit breaker on identical actions
The agent runtime, not the agent itself, should detect that the last N tool calls were identical and that the last N responses were structurally similar, then refuse the next call. We typically set N to five for the first version and tune up. The runtime returns an error to the planner that reads "circuit breaker tripped: identical tool call repeated five times. Try a different approach or escalate." Most planners respond to this by stopping. The few that do not, you catch with the next control.
A wall-clock heartbeat to a human channel
Every fifteen minutes the agent posts a one-line status to Slack: current task, last successful tool call, number of tool calls in the current planning cycle. A human glancing at the channel during business hours catches the loop within a cycle. The right format is closer to a build server's progress line than to a chat message. Quiet success is fine. Silence for ninety minutes is not.
A separate kill switch for paid tools
The cheapest fix is also the simplest. Give every paid tool its own per-run quota that the agent cannot override. If the quota is 200 fetch_url calls and the agent makes its 201st, the runtime kills the run and pages the on-call. The agent's prompt does not even need to know the quota exists. This is the control that would have made the Thursday incident a footnote instead of a refund conversation.
An agent that talks to a metered API is a financial system. Treat it like one: budgets, circuit breakers, heartbeats, kill switches. The prompt is not a control.
A note on captchas specifically
Captchas are not a transient network error and your agent should not treat them as one. If a page returns a captcha challenge twice in a row from two different IPs, the target site has decided your agent is a bot, and rotating proxies will not change its mind in the short term. The right behaviour is to stop, write the URL to a queue, and either escalate to a human or wait long enough that the site's risk score for the original signal has decayed.
The cheapest detection is a string match on the response body for the captcha vendor's domain (hCaptcha, reCAPTCHA, Cloudflare Turnstile, Bol.com's in-house challenge). The next-cheapest is a small classifier on a screenshot. Either one would have ended this run in under a minute.
What this firm did on the Thursday
We helped them write the refund request. Bright Data, to their credit, credited the bulk of the spend once the team produced the trajectory log and the timestamps. Not every provider does this, and the firm now treats the credit as a one-time courtesy rather than a strategy.
The team also rewrote the agent's scrape step. It now uses Bol.com's affiliate-feed API for catalogue data, falls back to a headless browser only for fields the API does not expose, and routes through residential proxies only when the headless attempt actually returns a non-captcha response. The fallback is gated by a per-task budget of 50 MB of residential traffic. If the budget trips, the agent writes a row to a needs_human table and stops.
The framework upgrade took two engineering days. Residential proxy spend in the first week after the change was €37.
The smallest thing you could do today
If you run a single autonomous agent that touches a paid API, open the provider's dashboard right now and set a daily spend cap at two times your last week's average. That one change would have capped Thursday's incident at roughly €60. When we built the research and pricing agents for an Eindhoven client last quarter, the lesson we kept hitting was that the cheap controls (a cap, a counter, a heartbeat) work better than the clever ones. If you want help wiring those into your own AI agents without rebuilding the whole stack, that is the part of the work we do.
Key takeaway
An agent that talks to a metered API is a financial system. Budgets, circuit breakers, heartbeats and kill switches matter more than prompt wording.
FAQ
How did a single captcha cost €2,140?
The agent retried the same blocked page roughly 4,800 times in four hours, each time through a fresh residential IP. Residential proxies bill per gigabyte, and that volume of captcha-served pages adds up fast.
Why didn't the framework's retry cap save them?
The ten-attempt cap reset every time the planner re-entered the scrape step. Because the planner kept deciding the scrape still needed to happen, the counter never reached its limit.
What is the cheapest control to add tonight?
A daily spend cap in your proxy provider's dashboard, set at two or three times your normal usage. It would have capped this incident at roughly €60 instead of €2,140.
Should an agent ever retry a captcha?
No. Two captchas in a row from different IPs means the site has classified your agent as a bot. The correct behaviour is to stop, queue the URL, and escalate to a human or a long wait.