Chat agents

Wft mystery-shop failures: a field guide for chat agents

Eighteen specific failures a Dutch insurance-broker chat agent will hit in an AFM mystery-shop. Half are a prompt edit. Half need a compliance officer with a kill switch on the write path.

Jacob Molkenboer· Founder · A Brand New Company· 15 Jun 2026· 8 min

Open leather ledger on ivory paper, brass bell, folded checklist, green ribbon bookmark, red wax seal, side window light.

14:32, a Tuesday in March. The AFM mystery shopper types into a Dutch broker's chat widget: "mijn vader is 67, kan hij nog een overlijdensrisicoverzekering afsluiten?" The bot answers in seven seconds. The shopper screenshots the conversation. Three weeks later the broker gets a phone call from the AFM asking who, exactly, is responsible for the advice that just landed in the file.

If you run a Dutch insurance broker and you have shipped a chat agent on the public site, you already know the rough shape of the risk. What you usually don't have is a flat list of the failure modes that actually show up in a mystery-shop report. Below is ours, drawn from eighteen months of building these agents for tussenpersonen in Utrecht, Eindhoven and Antwerp. We have ranked them by who owns the fix: the marketing lead, who can edit the system prompt this afternoon, or the compliance officer, who has to gate the entire write path before the agent ships.

What the AFM is actually testing

Wft vakbekwaamheid is not one diploma. It is a base module plus six sub-modules: Schadeverzekeringen particulier, Schadeverzekeringen zakelijk, Inkomensverzekeringen, Hypothecair krediet, Pensioenverzekeringen, and Vermogen. The advisor in the chair has a stack of diplomas that matches the products they are allowed to advise on. The chat agent has whatever you typed into its system prompt last Friday at 16:50.

A mystery shopper does not care that the bot is "just a triage tool". The moment the conversation crosses from informeren (allowed) into adviseren (regulated under article 4:23 Wft), the broker is on the hook. The eighteen failures below are the ones we have either seen flagged in a real shop report or caught in dry-run testing for clients before launch.

Failures the marketing lead can fix in the system prompt

These are the easy ones, in the sense that the fix is a paragraph in the system prompt and a redeploy. None of them require touching the database, the CRM, or the dienstverleningsdocument. All of them have to be in place before the agent talks to a real customer.

Crossing from informeren into adviseren. The bot offers an opinion on which product fits. The fix is a hard refusal: the agent answers product questions in general terms, then routes to an adviseur. Concrete trigger phrases ("welke past het best", "wat zou jij doen") get a deterministic decline.
No license disclosure on first contact. The agent never mentions the AFM register number, the kantoornaam, or the fact that it is a bot. The fix is a one-line opener and a footer in every message: who you are, what you are not, and where to verify.
Mentioning a specific verzekeraar by name without comparison. "We werken meestal met X" is a recommendation, even when it is honest. The prompt names no insurers; it routes naming to the advisor.
Skipping the klantprofiel. The agent gives concrete coverage answers before it has any client context. The fix is a short, fixed intake: situation, what they already have, what triggered the conversation. Nothing about risicobereidheid yet; that belongs in the advice meeting.
False certainty about coverage. "Ja, dat is gedekt" is the single most expensive sentence a broker chat agent can produce. The prompt has a banned-phrase list and a fallback: "Dat hangt af van uw polisvoorwaarden. Ik zet het door naar uw adviseur."
Not flagging ongebonden vs. gebonden status. The Wft disclosure rules expect clients to know whether the broker is independent or tied to a small set of insurers. The prompt states it in plain Dutch on first contact and again when the conversation drifts toward a product.
Engaging on products outside the broker's module set. A schade-only broker's bot that takes a question about a hypotheek or pensioen has already failed. The prompt has an allow-list of topics and a polite refusal for everything else, with the advisor's calendar link.
Quoting premiums or rates. The bot offers an indicative price. Indicatief is a regulated word. The fix is no numbers in chat, ever, beyond what is already on the broker's public price pages.
Recommending cancellation of an existing policy. "U kunt die opzeggen" without inventarisatie is one of the AFM's favourite gotchas, especially around overstapadvies. The agent never advises cancellation; it always books a gesprek first.
Wrong tone on vulnerable clients. Mystery shop scripts often include a client who is recently widowed, terminally ill, or financially distressed. The prompt has a soft-handoff branch: detect the cue, drop the sales register, route to a human within one message.

A compact version of the refusal layer that sits at the bottom of the prompt:

REFUSAL RULES (always win over helpfulness)
- If user asks "welke is beter / wat raad je aan / wat past bij mij"
  -> reply with general information only, then offer afspraak
- If user names a specific product or insurer
  -> do not confirm fit, do not compare, route to adviseur
- If user asks about hypotheek, pensioen, beleggen, lijfrente
  -> politely state these are outside this chat's scope
- If user mentions overlijden, ziekte, ontslag, scheiding
  -> drop sales register, short empathetic line, route to human
- Never name an insurer. Never quote a premium. Never confirm
  coverage without seeing the polis.

Takeaway

If your chat agent can name an insurer, quote a premium, or tell a client to cancel a policy, it is already giving advies. The prompt is the cheap place to stop that.

Failures that force a compliance officer to gate the write path

These are not prompt problems. You can write the most careful system prompt in the Benelux and still fail the next eight, because they are about what the agent is allowed to do, not what it is allowed to say. The fix lives one layer down: the function-calling layer, the storage layer, and the integrations. If the agent can write to anything that survives the session (the CRM, the mailbox, the document store, the insurer API) you cannot fix the failure in the prompt. You have to disable the tool call.

Generating written advice documents. The moment the agent produces a PDF, an email, or even a saved chat transcript that looks like a recommendation, you have a schriftelijk advies on file. That triggers article 4:23 Wft on its own. The gate: no document generation path the agent can call.
Binding the broker to a specific recommendation in writing. The agent calls an integration that pushes "klant wil product X" into the CRM as a lead with a product attached. That is a written recommendation in the broker's own system. The gate: the agent can only write notes, never product fields.
Storing klantprofiel data outside the broker's control. The agent forwards intake answers to a third-party tool whose DPA does not cover financial intake data. The gate: a hard allow-list of where klantprofiel JSON can land, with the DPO signing off per integration.
Logging kennis, ervaring or risicohouding without an audit trail. The agent collects the Wft-required client knowledge fields and writes them to a vendor log that the broker cannot export on demand. The gate: every field of the klantprofiel is versioned, exportable, and tied to a timestamped chat ID.
Auto-sending the dienstverleningsdocument. The agent emails the DVD on its own initiative based on a heuristic. If the DVD does not match the actual service the client ends up receiving, you have an article 4:25b problem. The gate: DVD sending is a human action, always.
Auto-issuing offertes or aanvragen. The agent calls the insurer's API to pull a real offerte, or worse, files an aanvraag. The gate: any call into an insurer integration goes through a queue that an adviseur releases by hand. No exceptions, including "low-risk" product lines like reisverzekeringen.
Passing PII to the model provider without a covering DPA. The agent ships BSN-adjacent data, health information, or financial position into a model API whose data-processing agreement does not cover financial-services PII. The gate: a PII scrubber on the way in, a routing layer that prefers EU-hosted models for anything past the first turn, and a logged refusal for the hard cases.
Editing the advisor's CRM directly. The agent updates client records, changes contract statuses, or reassigns leads. Every one of those edits is a regulated act if it touches the advice trail. The gate: write-back is staged into a "needs review" queue; no field changes apply until an advisor clicks accept.

How we score a chat agent before it ships

For a broker we worked with last quarter the test loop looked like this. Twenty mystery prompts, written by a Wft-bevoegd compliance officer, covering the eighteen failures above plus six edge cases for their specific module set. The agent had to fail closed on every one of them, in Dutch, with the right routing target.

The scoring is binary per item, not weighted. There is no partial credit on a Wft test: the AFM does not grade on a curve. What we do weight is the cost of the fix. The first ten are reverts to a system-prompt commit. The last eight are usually a refactor of the agent's tool list. We will not ship a broker agent that can call an insurer API directly; the integration sits behind a queue every time.

The pattern is general beyond insurance. Any regulated domain where the chat agent has a written-advice cliff next to it (mortgage brokers, accountants, medical triage, energy switching) has the same split. Marketing owns the words. Compliance owns the verbs.

What to do this afternoon

Open your agent's system prompt and grep it for two things: any sentence that names a specific product or insurer, and any instruction that lets the agent generate a document or call an external API on its own. The first is a prompt rewrite. The second is a conversation with whoever owns your write path.

When we built the broker chat agent for a tussenpersoon in Utrecht, the thing we hit hardest was the CRM write-back: a "helpful" lead-creation call that turned every triage chat into a written record with a product attached. We solved it by routing all CRM writes through a queue the adviseur clears at the start of each day. If you want a closer look at how that AI agent setup runs in production, the broker pattern shows up across most of our regulated-industry work.

Key takeaway

If your chat agent can name an insurer, quote a premium, or write to a CRM, you cannot fix the AFM problem in the prompt. Disable the tool.

FAQ

What is Wft vakbekwaamheid?

Dutch financial-services professional-competence rules: a base module plus six sub-modules covering schade, inkomen, hypotheek, pensioen and vermogen. Advisors hold diplomas; a chat agent only has its prompt and tools.

Can a chat agent give regulated advice in the Netherlands?

Only if the broker accepts the conversation is advies under article 4:23 Wft, with all the obligations that brings. Most brokers configure the agent to refuse advice and route to a human.

Where does the prompt fix stop and the compliance gate begin?

At the write path. If the agent can produce a document, write to the CRM, or call an insurer API on its own, no prompt change is enough. The tool itself has to be disabled or queued.

What do AFM mystery shops actually flag on chat agents?

Public AFM material focuses on misleading information, false certainty about coverage, missing license disclosure, and product recommendations without inventarisatie. That maps onto the prompt-fixable bucket above.

chat agentsai agentsautomationstrategyoperationssecurity

Building something?

Start a project