AI agents
AI seat caps: what Uber's $1,500 means inside an SME CRM
An ops manager opens a Slack alert: one rep burned 1.2 million tokens this week. Uber's $1,500 AI cap reads differently once you've stared at that line.

Last Thursday at 16:00 Amsterdam, an ops manager at a 40-person staffing agency we work with got a Slack ping from her CRM agent. One sales rep had used 1.2 million tokens that week. She didn't know what 1.2 million tokens cost, what it bought, or whether to be worried. The rep was their top biller. He'd closed three deals. The bill, when she ran the math, was €38.
That same week, Uber told its engineers their internal AI tools were capped at $1,500 a month per seat. The internet found that number remarkable. It's a useful signal, but not for the reason most of the commentary picked it up.
The number behind the headline
Uber's cap is for engineers using coding agents at the heaviest end of the curve. Long autonomous tasks, parallel agents, large repo reads. The $1,500 line sits above what most teams spend in practice but below where the truly degenerate cases pile up. It's the point at which a power user is pushing the model eight hours a day on real work, and above which you're probably running a script or burning tokens by accident.
For an SME running a CRM agent across eight sales reps, the question isn't whether your seats cost $1,500. They almost certainly won't. The question is what your usage curve looks like, and where the failure modes sit.
What a sales seat actually burns
A CRM agent that handles inbox triage, call summaries, lead enrichment, and draft replies for a single sales rep eats tokens in predictable shapes. Roughly:
- A summary of a 30-minute call: about 8k input, 1k output.
- A draft follow-up email with context from the last three touches: about 5k input, 400 output.
- A lead enrichment pass that reads the CRM record and proposes next steps: about 12k input, 800 output.
Multiply by a busy day (5 calls, 30 follow-ups, 20 enrichments) and the math lands at roughly:
input: 5 × 8k + 30 × 5k + 20 × 12k = 430k tokens
output: 5 × 1k + 30 × 0.4k + 20 × 0.8k = 33k tokens
At current Anthropic Claude Sonnet pricing (around $3 per million input, $15 per million output), that's roughly $1.79 a day, or about €40 a month per active seat. Multiply by eight reps and you're at €320 a month for the whole team.
That's the happy path. Now consider the edge cases.
Where the bill actually breaks
The cost overruns we've seen across fourteen production agents all came from the same handful of patterns.
A retry loop nobody noticed: an agent calling a flaky API, retrying with exponential backoff, each retry carrying the full conversation context. One bug we shipped retried 47 times before the timeout caught it. Each retry cost roughly the same as the original call.
A rep who discovered the agent could "research" prospects. He started asking it to read a LinkedIn export, the company's last quarterly report, and a Glassdoor review for every lead. His single-seat cost went from €40 to €290 a month before anyone noticed.
A long-running RAG retrieval that grabbed the wrong chunk size from production config. The agent was reading 60k tokens of context for every email draft instead of 5k. The output looked normal. The bill quietly doubled. Anthropic's writeup on contextual retrieval hints at why this kind of regression is easy to miss: the cost of a single "smart" agent answer is highly variable, and the variance compounds across a team.
None of these are crazy abuses. They're the boring failure modes of any system where consumption scales with a model call.
The dangerous cost mode is not the heavy user. It's the silent feedback loop. Build alerting around the derivative of spend, not the absolute number.
Caps that fail safely
When we set per-seat limits in CRM agents, we don't set them at the median. We set them at roughly 3x the median, with two cutoffs.
A soft cap at 2x median: the agent posts a daily summary into the rep's Slack DM. "Yesterday you used 280k tokens, which is twice your usual. Mostly enrichment calls on the Acme Corp record." That's it. No block, just a sentence the rep can react to.
A hard cap at 5x median: the agent fails closed. New requests get a polite refusal and a link to the ops manager.
The soft cap is the one that does real work. It catches the LinkedIn-research pattern within 24 hours. It catches the retry-loop bug within an hour, if you wire the alert to the derivative. The hard cap is just the seatbelt.
This is closer to how Uber's number reads, if you squint. They didn't pick $1,500 because engineers cost $1,500. They picked it because somewhere above that line, the human is no longer in the loop.
The shape of a fair model
Most of the post-Uber commentary argued whether per-seat AI pricing should be a vendor strategy. That's a vendor question. The buyer question is different. Do you know what one of your seats actually consumes, and does your contract or your in-house build give you a clean way to cap it?
If you're buying a CRM with a built-in AI agent and the vendor advertises "unlimited AI", that's not a feature. That's a bet they're making on your usage curve being shallow. When the curve isn't shallow, the price doubles at renewal.
For the staffing agency, we ended up with this:
budget per active seat: €60/month soft, €150/month hard
alert: daily Slack summary if 24h spend > 2 × 7-day median
fallback: hard cap returns cached "we're at capacity, ping ops" reply
billing: ops manager sees per-rep dashboard, not raw token counts
The ops manager doesn't need to know what 1.2 million tokens means. She needs to know that her top rep is at 70% of his monthly soft cap on day 18, and that nobody else is above 40%. That's a sentence she can act on.
When we built the inbox-triage and CRM-agent stack for that staffing agency, the thing we ran into was exactly this gap between token math and operator language. We solved it by translating the bill into the unit the buyer already uses: deals touched, follow-ups sent, meetings booked. The AI agent didn't change; the dashboard around it did.
If you do one thing this week: pull the last 30 days of API logs for whatever agent you're already running, sort by user, and look at the top 10% of the distribution. That single sort answers most of the questions in this post.
Key takeaway
The dangerous cost mode in an AI agent is not the heavy user. It's the silent feedback loop. Watch the derivative of spend per seat, not the absolute number.
FAQ
How much does an AI-powered CRM agent actually cost per seat per month?
For a sales rep doing call summaries, enrichment, and follow-up drafts, expect €30 to €80 per active seat on current Claude or GPT pricing. Edge cases push that 5 to 10 times higher.
Why did Uber land on $1,500/month as their AI cap?
That number sits above the heaviest legitimate engineer usage but below where bugs or runaway scripts pile up. It's a soft signal, not a vendor benchmark.
What's the first cost-runaway pattern to watch for in agents?
Silent retry loops. An agent that retries a flaky API five times per call, with full context each time, can quintuple your bill without changing anything user-visible.
Should I cap per seat or pool the budget across the team?
Per seat. Pooled budgets hide which user or which workflow is causing the bill. Per-seat caps let you see the distribution and fix the outlier without punishing the team.