Chat agents
Claims-intake chat agent: from 14 minutes to 95 seconds
A 29-person Utrecht broker takes 740 first-notice-of-loss reports a week. We rebuilt the intake as a chat agent. Handle time fell from 14 minutes to 95 seconds. Solvency II audit trail intact.

It is Tuesday morning at 09:14 in Utrecht. A delivery driver has just rear-ended a client's leased van on the A2. The client opens the broker's website on the hard shoulder, taps the chat agent in the corner of the page, and starts typing in Dutch with one thumb. Forty seconds later she has a claim reference, a photo upload link, and a callback slot for the loss adjuster. Nobody at the broker has touched a keyboard yet.
The broker writes about €38M in gross written premium across motor, professional indemnity, and small commercial property. Twenty-nine people. Three claims handlers, two phone lines, one shared inbox. They are not large. They are also not allowed to lose a claim notification, drop a regulated timestamp, or improvise around Solvency II. That is the actual constraint of this story.
The intake desk before we showed up
Average first-notice-of-loss intake took 14 minutes. Of that, roughly four minutes was the actual conversation. The rest was retyping the same details into Anva, the policy admin system, and then a second pass to log the conversation properly for compliance. On a busy Monday the inbox hit 180 emails before lunch, and the team would still be acknowledging Friday's submissions on Wednesday.
The team did not want a chatbot. They had tried one. A vendor product had been quietly switched off after three months because the handlers kept having to apologise for what it had told customers. The brief we were given was specific: the same conversation, but faster, with nothing in the regulatory record degraded. If it shaved a minute off handle time but lost an audit trail, it failed.
What an FNOL actually is
First Notice of Loss is the moment a policyholder tells the insurer something went wrong. Under Solvency II Pillar 3 and the Dutch supervisor DNB's reporting expectations, certain fields are not optional: time of incident, location, policy reference, parties involved, a damage estimate range, and a clear distinction between what the policyholder claims and what has been independently verified. The intake conversation has to produce a record that can be reconstructed years later by an auditor who was not in the room.
The mistake most teams make when they put a language model on this is treating it as a free-form chat. FNOL is not free-form. It is a structured form with a polite voice in front of it. The chat is the interview; the form is the deliverable; both have to survive an audit.
The agent's shape
We built the agent around three layers. A conversation layer that talks to the customer in their language. A schema layer that fills a strict FNOL object. A bridge layer that writes that object into Anva, Outlook, and the audit log.
The schema is the load-bearing piece. Here is the trimmed version that ships in production:
type FNOLDraft = {
policy_ref: string | null
reported_at: string // ISO 8601, server-stamped
incident_at: string | null // ISO 8601, claimant-stated
language: 'nl' | 'en'
party: { name: string; role: 'policyholder' | 'third_party' | 'witness' }
location: { freeform: string; geo?: { lat: number; lng: number } }
line_of_business: 'motor' | 'pi' | 'property' | 'other'
damage_estimate_eur: { min: number; max: number } | null
third_party_involved: boolean
injuries_reported: boolean
attachments: Array<{ kind: 'photo' | 'pdf' | 'video'; url: string }>
verbatim_log: Array<{ role: 'user' | 'agent'; text: string; ts: string }>
provenance: Record<string, { from_turn: number; confidence: 'stated' | 'inferred' }>
}The agent's job, on every turn, is to do one of three things: append to verbatim_log, update a field with a provenance entry, or ask the next missing question. Nothing else. No summaries, no commentary, no opinions. The tooling layer surfaces what is missing; the conversation layer decides how to ask for it politely.
Bilingual without a language router
The broker's book is roughly 70% Dutch, 30% English, heavy on expat clients in Utrecht and Amsterdam. The first design we sketched had a language detection step at turn one and routed to two separate prompts. We threw it out within a week.
The problem is code-switching. A Dutch client who opens in Dutch will sometimes drop an English sentence ("the other driver said it was his fault") because that is how the conversation actually went. A separate prompt per language forces a hard switch and produces a confused agent that asks the same question twice.
What works: one system prompt, both languages declared up front, a mirror-the-user instruction.
Reply in the same language the customer just used.
If they switch mid-conversation, switch with them.
Always log the customer's verbatim words in the language they used.
Never translate the customer's words in the audit log.That last line matters for Solvency II. The audit trail has to show what the policyholder actually said, not what a model thought they meant. Translation belongs in a separate, clearly-labelled column for the handler's convenience, not in the record itself.
The audit-trail problem
The first build failed compliance review on a single sentence in our spec: "the agent summarises the loss in one paragraph for the handler." DNB-regulated brokers cannot file a summary the policyholder never saw. The handler reading the summary could act on a hallucinated detail and never know.
If your chat agent produces a summary for an internal user, that summary becomes part of the regulated record. Anything in it that the customer did not actually say is now a fabricated claim on a file that auditors can read.
We rebuilt the handoff so the handler sees three things, never two: the structured FNOL draft, the verbatim transcript byte for byte, and a field-by-field provenance map. If the agent fills injuries_reported: false, the provenance map points to the exact turn where the customer said "geen gewonden". If the agent inferred it from silence, the field stays null and an "unconfirmed" flag appears on the handler's view. We made not-asking visible. That single change turned the conversation from a black box into something the compliance officer signed off on.
Why this is less theoretical now
European courts and regulators are converging on a clear position on AI-generated statements: if your system says something incorrect about a regulated event, you are on the hook, not the model vendor. There is no "the model said it" defence on a DNB file.
For an insurance broker, every claim summary the agent emits has to be defensible either as the policyholder's own words or as a clearly marked inference. The provenance map is not a nice-to-have. It is the difference between a defensible record and a public-record case.
Week one in production
The agent went live on a Wednesday. By Friday afternoon we had three incidents worth fixing in real time.
The first was demographic. A retired client called the chat "robot" and stopped responding. We added a one-line opt-out at the top of the widget: "Liever een mens? Stuur 'mens' en we bellen je binnen 5 minuten." About 4% of conversations use it. That is fine; the routing is correct, and the team would rather lose a chat than annoy a forty-year client.
The second was a motor claim that went in with damage_estimate_eur: null because the customer said "I don't know yet, the bonnet is bent." The handler had to call back, which defeats the point. We added a clarifying micro-question that produces a range from a vocabulary list (scratch, dent, panel damage, serious, total loss) and maps each label to a euro range. Range guess accuracy is not perfect, but it is enough to triage.
The third was the kind of bug that only shows up in production. One conversation lost the policy reference because the customer typed it as NL-1234.567/A instead of NL1234567A. The agent had cheerfully asked them to "try again" three times. We added normalisation at the schema layer, not the conversation layer. The conversation should not be teaching the customer how the broker formats their own policy numbers.
The numbers, six months in
740 FNOL submissions a week, averaged across May 2026. 96% resolved without a handler touching the conversation. The remaining 4% are escalated by the agent itself when the loss type is bodily injury, total loss above €100k, or a third party is uncontactable. Those escalations are routed to a named human within sixty seconds.
Average handle time, measured from first customer message to a complete FNOL record in Anva: 95 seconds. The 14-minute baseline was measured the same way the year before, on the same team, on a comparable Monday-to-Friday spread.
Time saved per week: roughly 145 hours of intake-handler time. The team did not shrink. They moved to claim adjustment and recoveries, where €38M of GWP and a three-person rota had been chronically understaffed. The handlers we spoke to in month four said the work is harder now, and they prefer it.
The agent did not replace the handlers. It collapsed the part of the job that was retyping into Anva and freed the team for the part that needed judgement.
Things we would not do again
Three.
We tried letting the agent decide when a conversation was "done". It was wrong about 8% of the time, usually closing too early on a customer who was still typing. We replaced that with a hard rule: the conversation closes when every required field is filled or explicitly escalated, and never before. A model is bad at reading the silence between human messages; a schema is excellent at it.
We tried using the same model for the customer-facing chat and the internal handler summary. The customer chat needs warmth and forgiveness; the handler summary needs cold structure. Same model, two different prompts, but the warmth bled into the summary and the handlers started getting "the customer seemed quite distressed" lines they did not need. We split them onto separate prompt families and the complaints stopped.
We tried automating the callback scheduling at intake. Customers were promising slots they could not keep because they were still standing next to a wrecked car holding a phone in the rain. We moved the callback offer to after the policyholder confirms they are somewhere they can talk. Show rate went from 71% to 94%.
What this kind of build costs you to maintain
Once a chat agent sits on the critical path for a regulated process, the maintenance shape changes. It is not a chatbot you ship and forget. Every model upgrade has to be re-tested against the FNOL schema. Every change to Anva fields has to be reflected in the bridge layer. Every quarter, the compliance officer wants to see a sample of fifty conversations and the provenance map for each.
Model behaviour drift is a worry worth taking seriously: if a model silently changes behaviour, you may not notice until an audit. For regulated intake that means version-pinning your model, holding a regression suite of a hundred-plus historical conversations, and re-running it before every model bump. Cheaper than the alternative.
The boring layer is most of the job
When we built this AI agent for the Utrecht broker, the nine-week build broke down roughly as four weeks for the schema and provenance map, three for the conversation, and two for the Anva and Outlook bridge. The hard part was not the chat. It was the boring layer underneath: the field-by-field provenance, the verbatim log, the handoff a compliance officer can read in twenty seconds. We have seen four other teams try a regulated-intake chat agent without that layer; all four are switched off again.
If you want to test whether this fits your own desk this week: pick one process that takes more than ten minutes per touch, draw the structured object it has to produce, and hand-write the provenance map for ten real conversations. You will know within a day whether an agent earns its place.
Key takeaway
A regulated chat agent stands or falls on the boring layer underneath: a strict schema, a verbatim log, and a field-by-field provenance map the auditor can read.
FAQ
Does a chat agent for FNOL satisfy Solvency II reporting requirements?
It can, but only if the regulated fields are filled from a strict schema, the customer's verbatim words are logged untranslated, and every filled field is traceable to a specific conversation turn.
Why not use a separate prompt per language for Dutch and English?
Customers code-switch mid-conversation. A hard router forces an awkward switch and produces duplicated questions. One bilingual prompt with a mirror-the-user instruction is more reliable.
What stays a human task after the agent goes live?
Bodily-injury claims, total losses above a set threshold, and any case where a third party is uncontactable. The agent self-escalates these to a named handler within a minute.
How do you stop a model upgrade from quietly breaking compliance?
Pin the model version, hold a regression suite of around a hundred historical conversations, and re-run the suite against any candidate model before promoting it to production.