Chat agents

Legal intake chat agent: how a state machine doubled bookings

A Haarlem online legal advisor's intake chat agent stalled at 11 qualified consultations a week. A five-stage state machine pushed it to 23 in a month.

Jacob Molkenboer· Founder · A Brand New Company· 9 Jun 2026· 9 min

Leather appointment book, brass bell, green ribbon, fountain pen on index card, red wax seal on ivory desk.

An intake queue at 19:42

It is a Tuesday evening in February. The intake lead at a 24-person online legal advisor in Haarlem has fourteen chat sessions open across three browser windows. Eleven of them will end without a booked consultation. Two will book, then no-show because the chat never asked for a phone number. One will book and become a paying file.

The firm runs a public-facing chat widget on every landing page (labour law, tenancy, debt collection, family). The widget had been live for nine weeks. It was built on a single 1,800-token system prompt that tried to do everything in one shot: greet, qualify, collect facts, check jurisdiction, propose a slot, and email the handling lawyer. It averaged eleven qualified consultations a week. Conversion from chat start to booking sat around 9%.

We were asked to look at the prompt. We replaced the prompt with a five-stage state machine. Four weeks later the same widget was producing 23 qualified consultations a week. Same traffic, same lawyers, same calendar. The story of how that happened is mostly a story about giving the model less to think about, not more.

Why one big prompt stalled

The original setup looked sensible on paper. A long system prompt, tool calls for the calendar API, a few guardrails ("if the user asks for legal advice, do not provide it, book a paid consultation instead"). In practice it failed in four predictable ways.

First, the model would skip questions. A user types "I got fired yesterday and they're not paying my last month." A monolithic agent often jumps straight to "let me book you with our labour-law team" before asking whether the employment was Dutch-jurisdiction, whether the user already had a lawyer, or what the employer's company name was. The lawyer then opens the booking with no usable facts.

Second, the model would re-ask questions. If the user volunteered a phone number in turn two and the agent did not extract it cleanly, turn seven would ask "what's a good phone number to reach you on?" Users churned at that point.

Third, the model would freelance legal opinions despite the guardrail. A 1,800-token prompt is long enough that "do not give legal advice" gets diluted by the eight other things the prompt is asking the model to do. We saw transcripts where the agent told a tenant their rent increase was probably unlawful. That is a compliance problem before it is a UX problem.

Fourth, the handoff to the lawyer was unstructured. The lawyer received a calendar invite and a link to the chat transcript. Reading 40 turns of back-and-forth before a 30-minute consultation is not a workflow that survives a busy week.

The five-stage state machine

What replaced the prompt was, structurally, the same kind of thing you would build for a multi-step form. The conversation moves through five named stages. Each stage has its own short system prompt (200 to 400 tokens), its own required-fields schema, and its own exit condition. The model cannot advance to the next stage until the current stage's schema is filled. The model cannot give legal advice in any stage because no stage's prompt asks it to.

The five stages:

Triage. Classify the matter into one of seven practice areas. Detect urgency (deadline within 14 days, court date, eviction notice). Exit when practice_area and urgency_band are set.
Facts. Collect the minimum facts the lawyer needs to open the file: who is involved, what happened, when, and where. Exit when parties, event_date, and a one-paragraph summary are set.
Jurisdiction and conflict. Confirm Dutch jurisdiction. Ask whether the user already has a lawyer on this matter, and whether the opposing party is a name the firm cannot act against. Exit when both checks pass.
Contact and consent. Capture name, email, phone, and explicit consent to the firm's privacy notice. Exit when all four are validated (email regex, NL phone format, name not empty, consent timestamp set).
Booking. Read the lawyer's calendar via the practice management API, propose three slots, confirm one. Exit when a booking ID is returned.

Each stage runs as its own LLM call. The previous stage's structured output becomes context for the next stage's prompt. The conversation history is carried forward (the user sees one continuous chat), but the model behind the scenes is, at any given moment, only being asked to do one thing.

Takeaway

If your chat agent has more than one job, give each job its own prompt, its own schema, and its own exit condition. The model gets sharper. The handoff gets readable.

What "form extraction" actually means here

Each stage is a form. The model's job in that stage is to fill the form. The conversation is just the input method.

In practice this means every stage call uses structured output. We use the model provider's tool-use mechanism, the same shape you would get with OpenAI's structured outputs or with the schema-enforcement described in the Claude tool-use docs. The model is given a single tool, submit_stage_data, with a JSON schema that matches the stage's required fields. The system prompt tells the model: ask the user whatever questions you need, and when you have enough information to fill the schema, call the tool. Do not call the tool with empty or guessed fields.

The pattern in code, roughly:

type Stage =
  | "triage" | "facts" | "jurisdiction" | "contact" | "booking";

interface StageResult {
  stage: Stage;
  data: Record<string, unknown>;
  next_stage: Stage | "done";
}

async function runStage(
  stage: Stage,
  history: Message[],
  carry: Record<string, unknown>,
): Promise<StageResult> {
  const { system, schema } = STAGE_DEFS[stage];

  const res = await client.messages.create({
    model: "claude-sonnet-4-5",
    system: system + "\n\nKnown so far:\n" + JSON.stringify(carry),
    messages: history,
    tools: [{
      name: "submit_stage_data",
      description: `Submit completed data for stage: ${stage}`,
      input_schema: schema,
    }],
    tool_choice: { type: "auto" },
  });

  // If the model called the tool, the stage is complete.
  const toolUse = res.content.find(b => b.type === "tool_use");
  if (toolUse) {
    return {
      stage,
      data: toolUse.input as Record<string, unknown>,
      next_stage: nextStageAfter(stage),
    };
  }

  // Otherwise the model asked a clarifying question. Pass it to the user.
  return { stage, data: {}, next_stage: stage };
}

The orchestration loop sits in front of this. It keeps the user-facing chat window happy (one continuous thread, with typing indicators between turns) while routing each turn to whichever stage handler is current. When a stage exits, the carry object grows by whatever that stage submitted, and the next stage's system prompt sees it.

The numbers, the week after

Week one of the new setup ran in parallel with the old prompt: a 50/50 split on incoming chats. We compared the same metric the firm had been tracking, which is "qualified consultation booked" (a booking that does not get cancelled within 24 hours, against a real Dutch-jurisdiction matter, with a non-empty facts summary in the file).

The split test ran for three weeks. The state-machine variant produced 23 qualified bookings per week on average, against 11 for the prompt variant. By week three the gap was clean enough that the firm switched the prompt variant off.

Three other things moved that we had predicted would move.

Average conversation length went up. The old prompt averaged around five turns. The state machine averaged closer to eight. Users were being asked more questions, but they were the right questions.
No-show rate on booked consultations dropped because the contact-and-consent stage caught a class of users who would book a slot without confirming a phone number, then disappear.
Lawyer preparation time per consultation dropped. The handling lawyer now opens the file to a four-field summary rather than a 40-turn transcript. We did not measure this with a stopwatch, but the partner told us it was the change he noticed most.

What the trend toward bigger models gets wrong

There is a recurring argument on Hacker News this week that AI progress is slowing. The implication, usually, is that product teams should wait for the next jump. Our reading from the field is the opposite. Most agents we see in the wild are bottlenecked on product design, not on model capability. A 2026 frontier model running a 1,800-token Swiss-army-knife prompt will not outperform a 2024 model running a tight five-stage state machine. We have A/B-tested this. The state machine wins, every time, and it wins more clearly when the model is smaller.

The reason is that a state machine reduces the surface area the model has to reason over. Each call has one job, one schema, one exit condition. That structure is doing work the model would otherwise have to do for free, on every turn, with no guarantee.

One caution before you reach for this pattern. State machines are not a license to over-engineer. If your agent has one job (answer a product question from a knowledge base, for example), one prompt is correct. The five-stage shape pays off only when there is a real form being filled at the other end.

What we would do differently next time

Two things, in hindsight.

First, we built the conflict check into stage three. In practice the conflict list (firms the practice cannot act against) changes weekly. Embedding it in a prompt means every change ships through us. We should have made the conflict list a tool call against a database table the firm controls. We have since fixed that.

Second, we wrote the stage schemas in TypeScript and converted them to JSON Schema at build time. Fine for us. Painful for the firm's in-house developer who wants to add a field to the facts stage without learning our build pipeline. Next time we would let the firm own the schemas as plain JSON files in a Git repo they have commit access to.

A five-minute audit you can run on your own intake agent

Open a recent transcript that did not convert. Read it as the lawyer who would have taken the call. Count: how many of the facts you would need to start that consultation are actually in the transcript? If it is fewer than four, your agent is not extracting a form. It is having a conversation. That gap is the one we closed at the Haarlem firm, and it is the gap we close most often when we build AI agents for clients whose existing widget feels close but will not tip over the line. When we built the intake agent for that practice, the thing we ran into was that "be helpful and book a meeting" is not a specification. Five named stages with five named exit conditions is.

Key takeaway

If your chat agent has more than one job, give each job its own prompt, its own schema, and its own exit condition. The model gets sharper and the handoff gets readable.

FAQ

How long did it take to build the five-stage state machine?

About three weeks from kickoff to live A/B test. Two weeks on the stage schemas and orchestration loop, one week on the calendar integration and the conflict-check tool.

Do you need tool use to do this, or will plain JSON mode work?

Plain structured-output JSON mode works. The state-machine pattern does not depend on tool use. It depends on each stage having a strict schema and a clear exit condition.

Will the pattern still work on a smaller, cheaper model?

Yes. In our tests the five-stage version ran fine on smaller models because each call has a narrower job. Per-conversation cost was meaningfully lower than the monolithic prompt.

How does this prevent the agent from giving legal advice?

Structurally. No stage's system prompt asks the model to give legal advice, so there is no opening. Triage classifies, facts collects, jurisdiction confirms. None of them opine.

What happens if the user contradicts something they said earlier?

The orchestration loop allows backtracking. If a later stage detects a contradiction (for example a non-Dutch jurisdiction), it can reopen an earlier stage with the conflicting field cleared.

ai agentschat agentscase studyworkflowarchitectureautomation

Building something?

Start a project