AI agents
Capping Claude at €800/seat: a CRM agent containment playbook
It's the fourth of the month and your CFO is staring at a Claude invoice that quietly tripled. Forty sales reps, one CRM agent, no per-seat ceiling. Here's how we fixed it.

It's the fourth of the month. Your finance lead opens last month's Anthropic invoice, scrolls to the line for the sales CRM agent, and the number is forty percent higher than April. The team didn't grow. The product didn't change. Two reps just figured out the agent will do their entire morning of research and outreach if they ask it nicely. The board meeting is on Thursday.
This is the conversation we had three weeks ago with a B2B services team running forty sales seats on a Claude-backed CRM agent. Their stack worked. The cost curve didn't.
What follows is the playbook we landed on. A hard €800 per seat per month cap, no per-call hand-wringing, no surprise invoices, and the pipeline stayed unbroken. The numbers and code are the ones we shipped, with the client name swapped.
Why a cap, and why now
If you scrolled Hacker News in late May, you saw the discussion about Uber's internal $1,500 a month AI tool ceiling. The interesting part wasn't the number. It was that a company with Uber's negotiating power decided a cap was the right shape of the answer, not a discount.
The same logic applies at smaller companies, harder. With no cap, AI usage follows a brutal Pareto: the top two or three reps will burn sixty to eighty percent of your monthly Claude budget in the first ten days, and the bottom half will barely touch the agent at all. Your CFO sees the average and panics. Your top reps see no friction and accelerate. By month-end the line item looks nothing like the proof of concept you signed off on.
A per-seat cap fixes three things at once. It makes the bill predictable. It forces the conversation about which seats actually justify the agent. And it sets the stage for a degradation policy, because once "infinite" is off the table, you have to decide what happens when the budget runs out.
Pricing the seat at €800
We didn't pick €800 from a hat. We worked backwards from three numbers.
First, the cost of the actions the agent actually does. A typical CRM sequence at this client looked like: pull a dozen signals on a prospect, draft a first-touch email, summarise a sales call transcript, and update the opportunity record. With Claude Sonnet and prompt caching turned on properly, that sequence cost about €0.31 to €0.45. Without caching, the same sequence cost €0.90 to €1.20. Caching matters. We'll come back to it.
Second, the cost of the rep's time the agent replaces. The client's loaded cost per sales hour was around €85. If the agent saved a rep two hours a day across twenty workdays, that's €3,400 of replaced labour per seat per month. Anything south of a quarter of that is a comfortable margin.
Third, the cost of the rep going off-script. The hard cap had to be loose enough that no honest day of work hit it. We modelled a top-decile day (sixty to eighty agent sequences) and a top-decile month (eighteen of those days). That ceiling came out to roughly €720. We rounded up to €800 to leave headroom for one-off escalations, and made the ceiling itself easy to remember.
If your sequences cost different money, your number will be different. The shape of the question is the same.
The middleware that owns the budget
The first rule we wrote: no Claude API key lives on a rep's laptop, in a browser extension, or in the CRM itself. Every agent call passes through a tiny proxy service that the CRM talks to. The proxy authenticates the seat, checks the budget, makes the upstream Anthropic call, records exactly what the call cost in cache reads, cache writes, fresh input, and output, and returns the response.
The proxy is about four hundred lines of TypeScript. The interesting bit is the budget gate. Everything else is plumbing.
// proxy/budget.ts
import { db } from "./db";
import { priceCall, type Usage } from "./pricing";
const SEAT_CAP_EUR = 800; // hard monthly ceiling
const SOFT_WARN_AT = 0.80; // 80% triggers banner + downshift
const HARD_CUT_AT = 1.00; // 100% blocks non-critical actions
export async function checkSeatBudget(
seatId: string,
actionTier: "routine" | "critical"
) {
const cycle = currentBillingCycle();
const { spent_eur } = await db.one(
`select coalesce(sum(cost_eur), 0) as spent_eur
from agent_calls
where seat_id = $1 and cycle = $2`,
[seatId, cycle]
);
const ratio = spent_eur / SEAT_CAP_EUR;
if (ratio >= HARD_CUT_AT && actionTier === "routine") {
return { allow: false, reason: "seat_capped", spent_eur };
}
if (ratio >= SOFT_WARN_AT) {
return { allow: true, downshift: true, spent_eur };
}
return { allow: true, downshift: false, spent_eur };
}
export async function recordCall(seatId: string, usage: Usage) {
const cost_eur = priceCall(usage);
await db.none(
`insert into agent_calls(seat_id, cycle, model, input_tokens,
output_tokens, cache_read_tokens,
cache_write_tokens, cost_eur, ts)
values ($1, $2, $3, $4, $5, $6, $7, $8, now())`,
[seatId, currentBillingCycle(), usage.model,
usage.input_tokens, usage.output_tokens,
usage.cache_read_tokens, usage.cache_write_tokens, cost_eur]
);
}
Two design choices in there are worth pulling out.
One: the budget check returns a downshift flag, not just allow or deny. We use that to swap the model in flight without bothering the rep. Two: routine versus critical is decided by the calling action, not by the rep or the LLM. A "research a cold lead" call is routine. A "draft a counteroffer on a six-figure deal that is open in Salesforce right now" call is critical. The caller passes the tier.
Token accounting that actually adds up
If you bill a seat by "total tokens" you are lying to the seat. Modern Claude usage has four meters, not one. Cache reads cost roughly a tenth of a fresh input token. Cache writes cost slightly more than a fresh input token. Output costs four to five times an input token. Collapse those into a single dimension and you will either under-charge the seat (and blow through the cap before you notice) or over-charge it (and trigger soft warnings on calls that were nearly free).
The pricing helper does the four-meter math.
// proxy/pricing.ts
// EUR per million tokens. Pulled from your contract sheet.
// Update when Anthropic changes the price list.
const RATES = {
"claude-sonnet": {
input_fresh: 3.00,
input_cache_read: 0.30,
input_cache_write: 3.75,
output: 15.00,
},
"claude-haiku": {
input_fresh: 0.80,
input_cache_read: 0.08,
input_cache_write: 1.00,
output: 4.00,
},
} as const;
export type Usage = {
model: keyof typeof RATES;
input_tokens: number; // fresh, non-cached
output_tokens: number;
cache_read_tokens: number;
cache_write_tokens: number;
};
export function priceCall(u: Usage): number {
const r = RATES[u.model];
const per = (x: number) => x / 1_000_000;
return (
per(u.input_tokens) * r.input_fresh +
per(u.cache_read_tokens) * r.input_cache_read +
per(u.cache_write_tokens) * r.input_cache_write +
per(u.output_tokens) * r.output
);
}
Keep the rate table in a config file, not the code. The numbers change. The shape doesn't.
Caching is the budget multiplier
The single biggest lever in this whole exercise is prompt caching. The CRM agent reads the same thirty-thousand-token bundle on every call: the company's sales playbook, the objection-handling guide, the ICP definition, and the last week of internal threads about pricing. None of that changes between calls. All of it has to be in the prompt for the agent to draft good outreach. Without caching, you pay the input token rate on thirty thousand tokens every single call.
With caching, you pay the cache write rate once per five-minute window, and the cache read rate on every call inside that window. In our measurements on this client, that pulled the median sequence cost from about €1.05 down to €0.38. A sixty-four percent reduction in spend, before any other optimisation.
If you have not turned on prompt caching for a high-volume agent, do that before you negotiate your contract. It will change the number you sign for.
Two practical notes on the cache. First, structure the prompt so that the stable parts come first and the volatile parts come last. The cache breaks the moment the prefix changes. Second, expect a cache miss on the first call of every five-minute window. A bursty team of forty reps will mostly stay inside that window during business hours and mostly fall out of it overnight, which is fine because nobody is working.
Soft warnings, hard cuts, and the override pool
The cap is not just a number. It's a sequence of behaviours triggered at sixty, eighty, ninety-five, and one hundred percent of seat spend.
At sixty percent, nothing user-visible happens. We log a marker so the analytics dashboard can colour the seat amber if the cycle is only halfway through. Managers see the chart. Reps don't.
At eighty percent, the proxy starts returning downshift: true on routine actions. The CRM swaps Sonnet for Haiku on background research and short summaries. Drafts that the rep will read and edit still use Sonnet. The rep sees a small banner in the agent sidebar: "You've used €640 of your €800 cap. Background tasks will use a smaller model for the rest of the cycle." No drama.
At ninety-five percent, even Sonnet calls get downshifted. The banner turns from neutral to warning colour, and the agent appends a one-line note to every draft: "Generated on the lite model to stay inside your monthly cap."
At one hundred percent, routine actions return a calm refusal: "You've used your AI budget for June. Your manager can extend it from Settings. Critical client actions on open deals still go through." Critical actions, the ones tagged at the caller, still execute, but they bill into a shared override pool with a daily ceiling of its own and a Slack ping to the sales ops lead. We sized the override pool at twelve percent of the headline budget, which has covered every legitimate over-run in the three cycles since.
One thing to be careful with. If the hard cut also blocks critical actions, your reps will start spinning up personal Anthropic accounts on their own credit cards within a week. Ask anyone who has tried to ban shadow IT. The override pool exists to keep the cap honest without pushing the team off-platform.
What the rep actually sees
The visible surface inside the CRM is three things. A small chip in the agent sidebar showing spend, cap, and days remaining in the cycle. A banner that appears at eighty percent and changes at ninety-five. And a refusal screen that is short, calm, and tells the rep exactly which two clicks lead to "ask my manager for a bump".
// crm/AgentBudgetChip.tsx
<Chip tone={tone(spent, cap)}>
€{Math.round(spent)} / €{cap} used · {daysLeft} days left
</Chip>
Reps stopped asking us about cost within a week. The chip answered the question before they thought to ask it.
The dashboard the CFO actually opens
The internal-facing dashboard is the other half. The proxy database makes this trivial because every call is row-level: seat, action tier, model, cost. Three charts pay for the whole exercise. Spend per seat over the cycle, sorted descending. Spend per action type, to see if any one workflow is eating disproportionate budget. And the override pool burn, with a red line at the daily ceiling.
This is the chart that turns the cap from a finance defence into a product conversation. When the team sees that "research a cold lead" is eating thirty-five percent of the agent spend and converting at four percent, the next sprint isn't about budget. It's about a cheaper research workflow.
Rolling the cap out without a revolt
The biggest implementation risk wasn't technical. It was the rollout email. If the reps read the cap as a cut, you have a morale problem before the first Sonnet call gets downshifted.
We staged the launch over three weeks. Week one, the proxy ran in shadow mode: full accounting, no blocks, no rep-visible UI. We used that week to confirm the per-seat distribution actually matched our model, and to find two power users whose workflow needed the override pool from day one instead of day fifteen.
Week two, the chip appeared in the sidebar but the cap was set at €1,200, deliberately above the highest observed spend. Reps got a heads-up note that explained the four-meter pricing in plain language and named the override path. Nobody hit the ceiling.
Week three, the cap dropped to €800 and the soft warnings turned on. We pre-briefed the two power users individually and offered to run the first week of their cycle on a shared budget so they could shape their habits without the chip going amber on day three. Both took the offer. By cycle three, neither needed it.
How we ship this
When we built the CRM agent for this team, the thing we ran into was that two power users were burning seventy percent of the monthly Claude spend in the first eight days of every cycle, and nobody noticed until invoices landed. We ended up solving it with the four-meter accounting, the soft-to-hard cap ladder, and the override pool above, and the bill has come in within five percent of forecast for three cycles running. If you want help wiring this kind of guardrail into your own AI agents, that's roughly the shape of the work.
The smallest thing you can do today: open last month's Anthropic invoice, divide the total by the number of people who actually used the agent, and see whether the number surprises you. If it does, your cap is already overdue.
Key takeaway
A per-seat AI cap with downshift before denial keeps the invoice predictable and the pipeline moving.
FAQ
Should the cap be per-seat or pooled across the team?
Per-seat for the headline cap. Pooled caps reward heavy users and punish careful ones, and you lose the per-seat data your CFO needs. Pool the override budget, not the headline cap.
What happens if a rep hits the cap in the middle of a live deal?
Routine actions are blocked. Critical actions, the ones the calling code tags as deal-stage work, still go through and bill to a shared override pool with a daily ceiling and an audit log.
How much does prompt caching actually save?
On this client, caching cut the median sequence cost from about €1.05 to €0.38, a sixty-four percent reduction. Your number depends on how stable your prompt prefix is.
Why a hard cap instead of just an alert?
Alerts get muted. A hard cap forces the cost conversation onto your calendar instead of into your inbox. You can still grant overrides; the difference is the default.