Strategy
Content-agent retainers: where the margin quietly leaks
You sold a content-agent retainer at €4.500 a client. Three months in, the API bill is twice the line you budgeted. Here is where the margin actually went.

It is the second Tuesday of the month. You are looking at the per-client P&L for your content-agent product and one row is in red across all eight clients. You sold the retainer at €4.500 a month each. The Anthropic invoice for the heaviest client is €1.380, the editor’s tracked hours come in at another €1.100, and you have not yet counted the project manager’s status calls. Margin is thin enough that you would not survive one of those clients churning.
The original sales deck said the cost would amortise. It does not, because you priced a content agent and shipped a content pipeline: eight cold, separate, flagship-model pipelines that each rebuild brand voice from scratch on every draft.
What follows is a field guide to the fifteen specific places that margin leaks out of a Dutch MKB content-agent retainer in 2026. They are ranked by what it takes to plug them. The first nine are things an account lead can fix this afternoon by editing one prompt-template file. The last six are things that require a conversation with the client about what the retainer actually is.
The pricing illusion
The trap is that the unit you sold the client is “a published blog post in our voice,” but the unit your costs scale on is “a tokenised inference call.” The two units do not move together. A 1.200-word post can cost anywhere from €0,14 to €11 depending on which model wrote it, how many times the brand-voice doc was re-sent, how many edit rounds the editor triggered, and whether the image pass ran on every paragraph or once at the end.
Most MKB agencies we have audited ship the most expensive version of that range and price as if they shipped the cheapest. Here is where the difference goes.
Nine leaks an account lead can fix in one prompt-template
1. Defaulting every call to a flagship model
Opus on the outline, on the draft, on the edit pass, on the social-cut, on the meta description. A Sonnet-tier model handles the outline, the brand-voice rewrite, the social-cut and the meta with no quality loss a human reader notices. Reserve the flagship tier for the one step that actually benefits, usually the first-draft long-form pass on technical clients. Anthropic’s published pricing alone explains a 5x cost delta between tiers.
2. Brand voice in the user message, not the system message
If the brand-voice doc, typically 4–8k tokens of “we write like this, not like that,” sits in the user message, it never caches. Move it to the system message, mark it as a cacheable prefix, and the second call onward charges a fraction of the first. Anthropic’s prompt-caching docs spell out the mechanics. In practice this single change is worth €40–€120 per client per month on a four-post cadence.
3. Re-sending the brand voice once per paragraph
The pipeline does an outline, then iterates paragraph-by-paragraph. Each paragraph call re-sends the full brand voice. Even with caching, the cache window has a five-minute TTL, so a long generation can blow past it. Batch the paragraph generation into one call, or write the full draft in one shot and edit per paragraph.
4. Editorial review as a flagship-model call
If the editorial gate is “ask Opus whether this passes brand voice and rewrite if not,” you are paying flagship rates to do classification work. A Sonnet-tier classifier with a yes/no/needs-edit schema costs a fraction and is more deterministic.
from anthropic import Anthropic
client = Anthropic()
def editorial_gate(draft: str, brand_voice: str) -> dict:
return client.messages.create(
model="claude-sonnet-4-5",
max_tokens=300,
system=[{
"type": "text",
"text": brand_voice,
"cache_control": {"type": "ephemeral"},
}],
messages=[{
"role": "user",
"content": (
"Score this draft against the brand voice above. "
"Return JSON with keys: verdict (pass|edit|reject), "
"reasons (array of strings), worst_line (string).\n\n"
f"DRAFT:\n{draft}"
),
}],
)
5. No structured output, so you parse markdown
Half the editor’s time is reformatting headings the model emitted as bold instead of H2. Pin the output to a JSON schema or an HTML fragment, validate at the API layer, and the editor only edits prose.
6. Image generation on every post when the client publishes weekly
Generating four hero images a month, at a few cents apiece, is fine. Generating one per H2 across forty posts is not. Default to one image per post, cached on the topic key, regenerated only when the editor flags it.
7. No length cap, so drafts run long
The client briefed 800 words. The model shipped 2.400. The editor cut it back. You paid for 2.400 tokens of output and the editor’s hour. Cap max_tokens at the brief length plus 20%, and have the prompt explicitly say “stop at 950 words even if the outline suggests more.”
8. Dumping the entire CMS history into context
The “knowledge base” is the last 200 posts the client published, attached on every call. It is doing very little for quality. Retrieve the three most semantically relevant posts via a cheap embedding lookup; keep the cache prefix lean.
9. Running brand-voice rewrite on prose the model already wrote in voice
The first-draft model has the brand voice in its system prompt. Then the pipeline runs a second “voice harmonisation” call on the output. Half the time it changes nothing and you paid twice. Run the gate first (point 4), only rewrite the paragraphs the gate flagged.
If you fix only points 1, 2, and 4, the per-client API bill drops by roughly two-thirds on most retainers we have audited. The remaining six are easy money on top.
Six leaks that force a retainer rebid
10. Flat fee, unlimited revisions
The retainer says “we publish four posts a month.” It does not say “after three editor rounds, additional revisions are billed.” So the client asks for round four, round five, round six, and each round is a fresh inference run plus an editor pass. You cannot prompt-template your way out of an open commercial term.
11. No SLA, so the quality bar drifts
The first month, “good enough” was 80% of the way to publishable. By month six, the client expects 99%. Your pipeline did not get worse, the bar moved. Either rewrite the SLA into the retainer with measurable criteria for what counts as publishable, or accept that this is a senior-editor product priced as a junior-editor product.
12. You sold “AI-only” but humans review every post
The story to the client was that the agent handles it end to end. The reality is the editor reads everything because the client demanded it after one bad post. The labour line is now permanent and the price did not move. Renegotiate the deliverable as “AI-drafted, human-finished” and reprice, or invest in the editorial-gate work that makes sampling actually safe.
13. Bespoke pipeline per client, no shared infra
Eight clients, eight repositories, eight prompt sets, eight deployment targets. Every model upgrade is eight migrations. Every prompt-caching gain has to be implemented eight times. The retainer was priced as if you had one product; you have eight. Consolidate onto a single pipeline with per-client configuration, or accept that you are running a custom-build studio and price like one.
14. Pricing per blog post, not per agent-month
Per-post pricing punishes you for every efficiency win. When the agent gets cheaper, the client expects the price to drop. Per-agent-month pricing, a flat retainer for “your content agent, running,” lets you keep the margin you earned by engineering it. Most MKB agencies we talk to are still on the first model and wonder why caching gains never reach the bottom line.
15. No telemetry per client, so you cannot tell which retainer is the loss-leader
The Anthropic bill is one number for the whole agency. You suspect one client is 60% of it but you cannot prove it, so you treat them all the same and average the loss across the book. Tag every API call with a client_id, build a weekly per-client cost line into your operations dashboard, and the bad-fit retainers become obvious within a month.
What to do today, and what to schedule for next week
If you read this in the morning, you can ship points 1, 2, 4 and 7 by lunchtime. They are edits to a single prompt-template file and one model-name string. The change shows up on the next Anthropic invoice cycle.
Points 10 through 15 do not get fixed in code. They get fixed by walking into the next client-retainer review with a one-page document that says “this is what we shipped, this is what it actually costs, this is what we propose for the next term.” The clients who are reasonable will renew on better terms. The clients who are not give you the answer you needed about whether to keep that retainer at all.
When we rebuilt the content pipeline for a Rotterdam-based B2B agency last quarter, the thing we ran into was that points 13 and 15 were doing more damage than 1 through 9 combined: eight forks of the same code with no per-client visibility. We ended up consolidating them into one configurable AI agent, wiring per-client telemetry into their existing BI tool, and renegotiating two retainers up and one out. The fee per client did not change much; the margin did.
The smallest thing you can do today: open your last Anthropic invoice, sort by the largest project_id, and ask whether that one client is paying you enough to justify being the largest project_id. If the answer is no, you have your first conversation booked.
Key takeaway
Most MKB content-agent retainers leak margin in three fixable places: flagship-model default, brand voice in the user message instead of system, and human review priced as an AI workflow.
FAQ
Will switching from a flagship model to Sonnet hurt the writing quality?
Not on outlines, brand-voice rewrites, social cuts or meta descriptions. The only step where readers tend to notice a difference is the first-draft long-form pass on technical topics. Keep the flagship there, swap the rest.
How much does prompt caching actually save on a content retainer?
On a four-post-a-month client with an 8k-token brand-voice doc, moving the doc to a cached system prefix typically saves €40 to €120 per client per month. The savings scale with cadence and doc size.
Should I move every client to per-agent-month pricing right now?
Move new clients today, existing clients at their next renewal. Per-post pricing punishes every efficiency win you ship; per-agent-month pricing lets you keep the margin caching and routing earn you.
What is the single biggest mistake on this list?
Not tagging API calls with a client_id. Without per-client telemetry you cannot prove which retainer is bleeding, so you average the loss across the book and underprice the next deal too.