Business

AI agent SOWs: eight new clauses after Germany v Google

A German court held Google liable for what its AI Overview made up. The week after, we rewrote every ABN agent SOW. These are the eight clauses, ranked by counsel pushback.

Jacob Molkenboer· Founder · A Brand New Company· 11 Jun 2026· 9 min

Folded ivory contract tied with twine, red wax seal, brass paperweight, green ribbon, dark fountain pen on desk.

The partner at one of our biggest clients emailed at 07:14 on a Wednesday: "Did you see the German Google thing?" He meant the ruling, reported across the trade press that week, in which a German court held Google liable for an AI Overview that fabricated facts about a named individual. By the Friday we had rewritten every active ABN statement of work. Eight new clauses. Some sailed through review. Some turned into ninety minutes on Zoom with two lawyers in pinstripes.

This is the field guide.

Why the SOW changed when the law didn't

The German ruling did not, by itself, rewrite European AI law. It applied existing press and personality-rights doctrine to a generative output, which is a perfectly normal thing for a court to do. What changed for us, as a studio that ships agents into production, was not the rules. It was the reading.

Until late May, an in-house counsel reviewing one of our SOWs would skim past the AI-output paragraph the same way they skim past the change-control clause in a standard SaaS T&C. After the ruling, they read it twice. They red-lined the second sentence. They asked us to come back on Tuesday.

We rewrote because the reading changed.

The new SOW lives at sow-v4.2.md in our template repo. Eight clauses are net new. Two old clauses got sharper. The rest of the document is the boilerplate we have been shipping since 2022. Below is the ranking, organised by what your in-house counsel will actually argue about.

The ranking is based on the eleven SOWs we sent to Dutch in-house teams between 28 May and 10 June 2026. Sample size is small. Track record is consistent.

The four clauses counsel rubber-stamps

These are the ones we expected to fight over and didn't. The pattern: the clause reads like a customer protection, the counsel reads it as a customer protection, conversation moves on.

Output provenance

The clause commits us to surfacing where every answer came from. If the agent retrieved a passage from the client's knowledge base, the passage is linked inline. If the response was generated freely, the UI labels it as a model summary, not a quoted source. The labeling rule is in the SOW. The implementation lives in the UX appendix.

Counsel reads this and thinks: our brand is not on the hook for an answer we can't trace. Stamp.

The cost is real. A RAG response with citations costs more to design, more to render, and more to test than a single generated block. We absorb the cost in the build budget rather than line-item it.

Human-in-the-loop carve-out

The clause names the specific decisions the agent cannot make autonomously. For most clients these are: anything that sends money, anything that fires or hires, anything that goes on company letterhead to a regulator, anything that touches a minor's account. The exact list is bespoke per client and lives in an appendix the client owns.

Counsel reads this and the air gets warmer. It is the clause they wanted but didn't know how to ask for. Stamp.

Model substitution notice

If we move your agent from one underlying model to another in production, we tell you 30 days ahead and run a parallel evaluation. The clause names the eval set (200 prompts you pick during onboarding) and the pass threshold (a per-dimension score at or above the prior model's baseline).

This one matters more than it looks. The substrate is moving. Hyperscalers are shifting toward terms where data flows back to providers for future model training, which is the kind of quiet change that redefines what your agent is doing this quarter versus last. Counsel doesn't always catch that. The eval clause is the lever they use to find out. Stamp.

Data deletion and export

The client can pull all stored conversation data, all embeddings derived from their content, and all fine-tuning artefacts within 30 days of a written request. We commit to a format (JSONL for transcripts, Parquet for vectors) and a destruction certificate from the underlying cloud provider.

GDPR-shaped, which is why counsel waves it through. Cheap to honour, as long as your storage is partitioned per client at the infrastructure layer. Which it should be anyway.

The four clauses counsel red-lines

These came back with track changes inside forty-eight hours.

Training data warranty

The clause: we will not train any model, internal or third-party, on your data without an explicit C-suite-signed opt-in. Including embeddings. Including evaluation prompts. Including the logs.

The problem: "we" is doing a lot of work in that sentence. ABN does not train models. The vendors we use sometimes do. Anthropic's commercial terms are clear that API inputs and outputs are not used for training by default, but the language around enterprise zero-data-retention is layered. OpenAI's business terms read similarly with different operational defaults.

Counsel red-lines this because the warranty is only as strong as the vendor commitment underneath. They ask us to do one of three things: name the vendor explicitly, exclude vendor behaviour from the warranty, or cap the warranty at "to the extent our vendor allows us to commit to it." We almost always settle on the third option with a named-vendor schedule appended.

Hallucination indemnity

The clause that came directly out of the German ruling.

If a third party brings a claim against the client that arises specifically and demonstrably from a hallucinated agent output that the client did not modify, ABN will defend the claim and indemnify up to the fees paid in the preceding twelve months.

Counsel red-lines this for two reasons.

First, the cap. They want it lifted. We rarely move on this. A studio cannot operate with an uncapped indemnity sitting on top of a probabilistic system. Second, the trigger. "Specifically and demonstrably" is doing a lot of work, and counsel correctly points out that proving causation against a black-box output is genuinely difficult. We end up adding an evidence schedule: logs preserved for 18 months, prompt and response captured per interaction, model version stamped at request time, retrieval IDs logged.

Warning

An uncapped hallucination indemnity is not a flex. It is a way to put your studio out of business when the first agent says something untrue about somebody's mother-in-law. Cap it, and put the evidence schedule in the appendix.

Vendor terms pass-through

The clause names which upstream vendor terms flow through to the client. If a model vendor's acceptable use policy prohibits a category of political ad generation, your agent inherits that prohibition. If their terms require age-gating, your agent inherits that requirement.

Counsel red-lines this because the upstream terms change. Every major vendor has revised its policies more than once in the past twelve months. Counsel asks, reasonably: are we agreeing to whatever those documents say at any future date, or to the frozen version dated 11 June 2026?

We agreed to the frozen version with a 30-day notification clock for material changes. Vendor-terms diffs go in the same monthly status report as code diffs.

Output liability cap

The headline. Total ABN liability for agent output is capped at fees paid in the prior twelve months, excluding the hallucination indemnity which has its own cap. Force majeure carved out. Gross negligence carved out. Wilful misconduct carved out.

This is the same shape of cap every services contract has carried since the 1990s, but counsel reads it harder when the deliverable is a thing that talks. We have lost two SOWs over the cap level. We have moved it three times for clients in regulated verticals where the cap had to be tied to insurance.

How the ranking helps you read your own contract

Reorder the eight in your head before you walk into a meeting with your vendor. The four rubber-stamp clauses are the ones you should already have. If your current SOW is missing output provenance or a human-in-the-loop carve-out, you are not negotiating them. You are missing them.

The four red-line clauses are where the conversation actually lives. Each has a number attached: the cap on the indemnity, the size of the eval set, the notice period, the vendor schedule. Those numbers are what your counsel will push, not the wording.

A working European frame for that conversation is the AI Act resource site, which is not the official text but is the most readable mirror of it. Once the high-risk provisions phase in, several of these clauses stop being optional anyway. We are watching that phase-in calendar more closely than the model release calendar.

What to do this afternoon

Open the active SOW you have with your AI vendor or build partner. Search for the word "training". Search for "indemnity". Search for "model". If any of those three returns nothing, schedule a thirty-minute conversation for next week.

That is the smallest thing.

When we built the email-agent for an ops team at a Rotterdam logistics firm last month, the clause that took longest was the model substitution notice. They wanted the eval set sized to 500 prompts, not 200, and they wanted the prompts to include real customer data. We agreed, ringfenced the prompts in an encrypted store, and now the eval runs nightly. The same shape of clause sits inside every project our AI agents practice is currently building.

Key takeaway

The clauses your counsel rubber-stamps protect the client; the clauses they red-line protect you. Both belong in the SOW.

FAQ

Do you actually use these clauses in production?

Yes. SOW v4.2 has been the template for every new ABN AI agent project signed since 31 May 2026. The eight clauses ship by default; clients negotiate the numbers, not the structure.

Will the German ruling apply outside Germany?

Not directly, but it sits on personality-rights doctrine shared across most EU jurisdictions. Dutch and Belgian counsel are already reading their own caselaw the same way. Treat it as a leading indicator.

What is the smallest version of these clauses you can ship?

Provenance, human-in-the-loop, and deletion. Those three address most consumer-facing risk without triggering a vendor renegotiation. Add hallucination indemnity once you have eighteen months of logs behind the evidence schedule.

Why cap the hallucination indemnity at twelve months of fees?

It matches the standard liability cap shape for services contracts and ties exposure to actual revenue. Uncapped indemnities on probabilistic output put a studio out of business on the first bad ruling.

ai agentsstrategyoperationsbusinessintegrations

Building something?

Start a project