Chat agents

Law-firm intake chat agent: handoff triggers that work

Friday 4pm, the intern is on holiday, and a new client mentions a court date eleven days out. The intake form did its job. The intake form is also why nobody noticed.

Jacob Molkenboer· Founder · A Brand New Company· 20 Aug 2024· 9 min

Cream envelope tied with chartreuse ribbon on dark green leather blotter, brass bell, folded form, ivory surface.

A Friday afternoon at a mid-sized Amsterdam law firm. A new intake email lands in the general inbox. The client writes, in Dutch: "Mijn werkgever heeft me ontslagen en de kantonrechter heeft een zitting op 18 juni." The intern who normally screens intake is on holiday. The partner is in court. The senior associate is in a deposition. The email sits.

The court date is eleven days away.

This is the problem an intake agent is supposed to solve. Not by answering the legal question. Not by quoting fees. Not by promising a callback. By doing the one thing that actually matters: noticing that a deadline has landed in the inbox, and pulling a human into the room before the deadline does.

What the intake agent is not allowed to do

Before we wired anything up, we wrote down the things the agent is forbidden from doing under any circumstance. The Dutch bar association has clear rules on what counts as legal advice and who is allowed to give it. A language model is not on that list. So the agent never tells a prospective client whether they have a case, what statute applies, what to do next, or how much it will cost.

What the agent does is narrower and more useful. It collects facts. It asks the questions the paralegal would have asked first anyway. It detects the three signals that mean a human needs to step in today, not in the morning. And it produces a clean, structured handoff so the human walks into the room already knowing what is on the table.

The three handoff triggers

Almost every law-firm intake that actually matters carries at least one of three signals in the first message. We picked these three because they are concrete, machine-detectable, and almost always correlate with urgency. If any of them fires, the conversation pauses and a real paralegal is paged.

A deadline word or date

Dutch civil procedure runs on dates. Verjaring, dagvaarding, zitting, beslag, termijn, hoger beroep, cassatie. Each of these words implies a clock. Most intake messages that contain one of them also contain an actual date, either in ISO format ("12-06-2026"), Dutch long form ("12 juni 2026"), or relative form ("over twee weken"). The agent flags either form.

A number that looks financial

Amounts trigger urgency for two reasons. First, an amount usually means there is already a counterparty making a claim. Second, amounts cross procedural thresholds: under €25,000 the kantonrechter is competent, above it the rechtbank. The agent does not need to know that. It just flags any euro figure over €1,000 and any number that looks like a case reference (the C/13/... format used by Amsterdam district court, the ECLI:NL: prefix used by published case law, the 200.... format used by appeal courts).

A counterparty name

This is the trickiest one and the most valuable. A client who names their employer, their landlord, their ex-partner's company, or the municipality has already done the work of telling you who is on the other side. The firm needs to check for conflicts before any substantive conversation starts. If the agent extracts a counterparty, the handoff is non-negotiable: nobody at the firm should reply until conflict-check has run.

Warning

Conflict-check failures are how law firms lose insurance cover. If the agent extracts a counterparty and a human replies before the check runs, the agent has made things worse, not better. Build the handoff so it blocks the reply path until the check completes.

The extraction schema

The agent's job at every turn is to update a structured object. We pass the conversation through a model with structured output enforced. The schema is short on purpose. Every field maps to a downstream consumer: the case management system, the conflict-check job, the paralegal queue.

{
  "matter_type": "arbeidsrecht | huurrecht | familierecht | ondernemingsrecht | other | unknown",
  "language": "nl | en",
  "client": {
    "name": "string | null",
    "email": "string | null",
    "phone": "string | null"
  },
  "counterparty": {
    "name": "string | null",
    "type": "person | company | government | unknown"
  },
  "dates": [
    { "raw": "string", "iso": "string | null", "kind": "zitting | termijn | beslag | other" }
  ],
  "amounts": [
    { "raw": "string", "eur": "number | null" }
  ],
  "case_refs": ["string"],
  "summary": "string, max 280 chars, factual, no advice"
}

The model fills what it sees and leaves the rest null. The schema is enforced at the tool-call layer, so the model does not get to ramble back in prose. When an agent is allowed to produce free text, downstream code starts pattern-matching strings, and the system gets brittle within weeks. Structured outputs are not a nice-to-have here. They are the contract between the model and everything that runs after it.

The trigger detector

Once the extraction object is updated, the detector runs. It is plain code, not a model. The point of using code here is auditability. A partner can read the detector in thirty seconds and understand exactly when the firm gets paged.

type Intake = {
  counterparty: { name: string | null };
  dates: { iso: string | null; kind: string }[];
  amounts: { eur: number | null }[];
  case_refs: string[];
};

const DEADLINE_KINDS = new Set(["zitting", "termijn", "beslag"]);

export function shouldHandoff(intake: Intake, now: Date): {
  handoff: boolean;
  reasons: string[];
} {
  const reasons: string[] = [];

  if (intake.counterparty.name) {
    reasons.push(`counterparty:${intake.counterparty.name}`);
  }

  for (const d of intake.dates) {
    if (!d.iso) continue;
    const days = (new Date(d.iso).getTime() - now.getTime()) / 86400000;
    if (DEADLINE_KINDS.has(d.kind) && days <= 21) {
      reasons.push(`deadline:${d.kind}:${Math.round(days)}d`);
    }
  }

  for (const a of intake.amounts) {
    if (a.eur !== null && a.eur >= 1000) {
      reasons.push(`amount:${a.eur}`);
    }
  }

  if (intake.case_refs.length > 0) {
    reasons.push(`case_ref:${intake.case_refs[0]}`);
  }

  return { handoff: reasons.length > 0, reasons };
}

The 21-day window for hearing dates is the firm's choice, not a legal one. They wanted any zitting inside three weeks to ring the bell. Other firms have different windows. It is a config value, not a constant.

Dutch-language specifics

The agent is bilingual NL/EN by default. About 30% of the firm's intake is in English (expats, cross-border employment, IP disputes). We did not train anything. We prompted the model to detect the language on the first message and lock to it. The schema fills the same way either way; the agent's replies match the client.

The harder thing is Dutch date parsing. The model handles "12 juni 2026" fine. It handles "aanstaande donderdag" less well, especially in a multi-turn conversation where "aanstaande donderdag" might have been said three days ago. We solved this by stamping the conversation start time into the system prompt and asking the model to resolve relative dates to ISO immediately, never carry them forward as relative text. If the model is uncertain, the schema's iso field is null and the date is treated as raw context only, not a trigger.

The handoff itself

When the detector returns handoff: true, the agent does three things in sequence:

It tells the client, plainly: "I'm going to pause here and have a colleague look at this. You'll hear from us within [the firm's SLA]." No mention of triggers, no mention of AI, no mention of urgency. The client does not need to know how the sausage was made.
It writes the extraction object and the trigger reasons to the case management system. The matter is created in "Awaiting conflict check" status.
It pages the on-call paralegal through the firm's existing Slack channel, including a one-paragraph summary, the structured object, and a link to the full transcript.

The Slack message is the part the paralegals actually see. We spent more time on its formatting than on the model prompt. The first line is the trigger reason in plain Dutch. The second line is the deadline if there is one. The rest is collapsed in a thread. A paralegal can triage twenty of these in the time it used to take to read one email.

How we tested it before it touched real intake

We replayed roughly 800 historical messages from the firm's last eighteen months of inbox, anonymised in place. For each message we already knew what the human had done: replied within an hour, replied next day, escalated to a partner, missed entirely. We ran the agent and detector against the raw text and scored its output against the human record.

Day-one accuracy was around 91%. The misses clustered around two failure modes: counterparties referred to obliquely ("mijn baas", "de verhuurder van mijn moeder") and dates written as Dutch ordinals ("de 18e"). We added a counterparty-resolution pass that asks one targeted follow-up question when the extraction object has a matter type but no counterparty name. We added more Dutch date patterns to the schema description. Six weeks in, the miss rate on a rolling thirty-day window is under 2% and the false-positive rate is under 5%.

None of that is impressive on its own. What matters is that the eval set is checked into the repository, every model or prompt change has to beat the last accuracy number to ship, and the day a regression slips in we know within a day. The reason recent stories about a model introducing regressions in well-trodden code land so hard is that most teams shipping with LLMs do not have an eval harness at all. If the model is in your critical path, the eval set is the only thing standing between you and a Friday you will not enjoy.

Audit logging

Dutch law firms run under the supervision of the Nederlandse Orde van Advocaten, and the firm we built this for wanted to be able to demonstrate, on demand, exactly what the agent had said to any prospective client. So every conversation is stored end-to-end, including the system prompt version, the model id, the temperature, and the extraction object at every turn. If the firm is ever asked "did your AI give legal advice in this conversation," they can produce the full record in under a minute.

Because intake conversations contain personal data, the audit log lives behind the same access controls as the case management system, with retention windows aligned to the firm's existing data-protection policy. The Autoriteit Persoonsgegevens has been clear that AI processing of personal data does not get a special exemption from the AVG: same lawful basis, same data minimisation, same subject rights as any other processing. We log enough to defend the system. We do not log more.

We also log every case where the detector did not fire but the human later flagged the matter as urgent. That log is the agent's real benchmark. The accuracy number that matters is not "did the model extract the date correctly," it is "did the detector miss anything the paralegal would have caught."

What we actually built, and what we'd change

When we built this for an Amsterdam employment-law boutique, the thing we ran into was that the model would occasionally re-summarise the client's situation in a way that sounded like it was confirming a legal position. "Het klinkt alsof je een ontslag op staande voet hebt gekregen." Technically a paraphrase, functionally a diagnosis. We ended up adding a final pass that rewrites the agent's outgoing message into pure question mode whenever the extraction object contains a non-null matter_type. The agent is allowed to know what kind of matter it is; it is not allowed to tell the client. The same kind of careful scoping is what we do across all our AI agent work.

If you run intake at a firm of any size, the smallest thing you can do today is this: open the last fifty messages that landed in your general inbox, mark the ones that contained a date, an amount, or a counterparty name in the first paragraph, and count what fraction your team caught within four hours. That number is the gap an agent is going to close.

Key takeaway

An intake agent is not there to answer legal questions. It is there to make sure a human sees the urgent ones before the deadline does.

FAQ

Does the chat agent give legal advice?

No. It collects facts, asks clarifying questions, and hands off to a human. It never tells a prospective client whether they have a case, what statute applies, or what to do next.

Which languages does the agent support?

Dutch and English by default. It detects the language on the first message and locks to it. The structured extraction schema fills the same way regardless of language.

What happens after a handoff is triggered?

The matter is created in 'Awaiting conflict check' status, an on-call paralegal is paged in Slack with a structured summary, and the agent stops replying until a human takes over.

How do you measure detector accuracy?

An eval set of historical intake messages, scored against what the firm actually did. Every prompt or model change has to beat the last accuracy number before it ships.

Is the chat transcript discoverable in litigation?

Treat it the same as any other client communication. The firm stores conversations end-to-end with full system prompt, model id, and extraction state for exactly this reason.

chat agentsai agentsautomationworkflowoperationscase study

Building something?

Start a project