Business
AI agent pricing: why we killed our fixed-price contracts
Fixed-scope contracts work when the work is predictable. Customer-facing AI agents are not. We tested three pricing models on six 2025 projects to find one that survived production.

The contract was three pages and very clean. Discovery, build, launch, two weeks of bug-fixing, sign-off. Fixed price, payable in three tranches. We had shipped fourteen websites on this template and they had all closed in the green.
This one was a chat agent for a Dutch ferry operator. Eight weeks later we were past sign-off, the model had been deprecated mid-project, the client had added a refund-policy edge case nobody had mentioned in scope, and the booking-API timeout was eating ten percent of conversations. We honored the price. We lost forty hours of senior time. The client was thrilled. We were not.
That was project three of 2025. By project six we had stopped writing fixed-price contracts for anything that touches a customer over chat, voice, or email.
Customer-facing agents are not websites
A website ships and then mostly sits. Copy gets tweaked, a section gets added, but the surface area of "what can go wrong" is bounded by the pages you built. You can scope it.
A customer-facing agent has three properties that fixed-scope contracts cannot survive.
The model is moving under you. Between January and November 2025 we shipped on four different Claude versions and three OpenAI ones. Each migration meant rewriting prompts that had taken weeks to calibrate. Anthropic publishes a deprecation schedule, and it is honored, but a twelve-month window is a twelve-month window. Your contract is six.
The edges are infinite. A booking flow has, generously, twenty named states. A customer typing a question has none. Every week brings a class of input nobody scoped. "Can I bring my dog if he is in a cage but the cage is in the car?" was an actual ferry question. The agent had to learn it.
Trust is built post-launch, not at launch. The first month of a live agent is the only month that matters for tone, escalation, and refusal behavior. That is exactly when fixed-price contracts have already paid out and the team has moved on.
Model one: fixed price per milestone
This is what we tried first, on three of the six projects. Split the work into discovery, build, integration, and stabilization. Bill at the end of each.
It feels safe. It is not. The unit you are billing for ("integration done") is not a thing you can objectively measure with an agent. Done against the spec, sure. Done against actual customer behavior, no. We hit sign-off on all three milestones for the ferry project. The agent was still hallucinating refund amounts in week ten.
Fixed milestones also push the wrong work first. Discovery becomes a paper exercise because you cannot see the agent yet. The real spec emerges in week three when you read your first hundred transcripts. By then the contract has already been agreed.
Model two: per-conversation outcome pricing
We tested this on two projects. €0.40 per resolved conversation, €0 per escalation. The pitch is hard to refuse: client only pays for value, we are aligned on quality.
The pitch hides three problems.
First, "resolved" is a label the agent assigns to itself. We tried human review, sampling, and downstream signal (did the customer email again within 48 hours?). All three were defensible. None were cheap. The labeling cost on one project was eighteen percent of the line item.
Second, the unit economics are brutal at the start. The first thousand conversations contain the prompt rewrites, the hallucination patches, and the gotcha-fixes. Charging per conversation means you eat all of that before any revenue lands. We did the math on project five and discovered we needed seventeen weeks of live traffic to break even on the prompt engineering alone.
Third, clients hate it. Not the model itself. The invoice. A bill that fluctuates by thirty percent month over month, indexed to traffic they cannot directly control, generates more finance-team friction than the savings justify. Two clients asked to switch back to a flat rate within sixty days.
Model three: weekly retainer with a cap
This is what we landed on, and what runs on all four agents we shipped in Q4. The structure is small.
A fixed weekly retainer covers ongoing work: prompt iteration, transcript review, integration patches, model migrations. A cap on hours per week, agreed up front, above which we either renegotiate or push to the next sprint. A separate one-time fee covers the initial build, scoped to "first version live in staging."
Three things this fixes.
The cadence matches the work. Customer-facing agents change weekly because customer behavior changes weekly. A weekly billing rhythm makes that legible to the client's finance team instead of hiding it inside a fixed-scope re-quote every quarter.
The cap protects both sides. Without one, retainer agreements drift into "we are always available," which is unprofitable for us and creates passive dependency for the client. With a cap, both parties have a real number to push back on when scope creeps.
The build fee covers the build. Initial setup is genuinely fixed-scope work: provision infrastructure, write the system prompt, integrate the booking API, ship to staging. Pricing that part flat is honest. Pricing the next twelve months of customer contact flat is not.
What the contract actually looks like
Two pages. Three numbers: build fee, weekly rate, weekly hour cap. Two clauses: what happens when we hit the cap (rolled, renegotiated, or escalated), and what happens when the underlying model is deprecated (covered under the retainer, no separate change order).
We borrowed the shape of this from Basecamp's Shape Up approach to bounded work, though we cap the week rather than the cycle. Clients who sign it tend to be the ones who already understand that an agent is software in production, not software shipped and done.
If your AI agent contract has a sign-off date, you have not priced for the agent. You have priced for a website that happens to talk.
The studio-wide pattern
The pattern is not unique to us. Every studio shipping production agents in 2025 ran the same experiment, in some order. We are publishing the result because the fixed-price agent contract is the single most common reason we see other agencies quietly stop offering agent work after one or two projects. The economics break, the client relationship sours, and the team concludes "agents are not for us." It was the pricing, not the agents.
When we rebuilt the ferry agent under the new structure, the thing we ran into was that finance teams needed the cap clause explained twice before they signed. We solved it by including a one-paragraph plain-language summary at the top of the contract. The work itself ships better, faster, and at a margin we can defend. If you are scoping customer-facing AI agents right now, this is the part that matters more than which model you pick.
A five-minute audit
Open the last contract you signed for an AI agent. Find the section that covers what happens when the underlying model is deprecated. If it is not there, or if it lives in a generic "change order" clause, you are carrying the deprecation risk yourself. That is the smallest thing worth fixing this week.
Key takeaway
Fixed-price contracts assume the work is bounded. Customer-facing AI agents are not bounded, so the contract has to bill the cadence instead.
FAQ
Should every AI agent contract use weekly retainer + cap?
For customer-facing agents (chat, voice, email), in our experience yes. For internal-only agents with a bounded process, fixed price still works because the inputs are bounded too.
What hour cap do you typically set?
Eight to twelve hours per week for one live agent in steady state. Higher for the first month after launch, then negotiated down once the prompt and escalation behavior stabilize.
How do you handle model deprecations under the retainer?
Covered under the cap. We treat model migration as ongoing maintenance, not a change order. If a migration needs more than the cap, we surface it before the work starts.
Doesn't per-conversation pricing align incentives better?
In theory yes. In practice the labeling cost and revenue lag make it worse for both sides during the first six months. We may revisit it at higher conversation volumes.