Voice agents

Voice agent war story: 47 patients, one closed clinic

An operations lead's phone rings at 09:14 on a Tuesday. A patient is outside a locked dental clinic in Eindhoven. Her booking was for 09:00. Then forty-six more calls.

Jacob Molkenboer· Founder · A Brand New Company· 13 Feb 2025· 8 min

Cream bakelite phone off-hook on dark green leather blotter, coiled cord, notepad with chartreuse ribbon, brass bell, red tag.

An operations lead at a 26-person dental chain in Eindhoven picked up her phone at 09:14 on a Tuesday in May. A patient was standing outside one of their clinics in the south of the city. The door was locked. The lights were off. Her appointment, booked three days earlier through the chain's AI voice agent, had been for 09:00.

The clinic had closed at Easter. Ten weeks earlier. The location was being prepped for a new tenant.

By the time her coffee was cold, three more patients had called from the same shuttered building. By the end of the day the count was forty-seven. Forty-seven confirmed bookings, all routed through the voice agent, all sent to a clinic that had not seen a dentist since Easter Monday. Nobody at the chain had any idea this was happening until that Tuesday.

We did not build this voice agent. The chain's previous agency had wired it up the year before, and it had been working well enough that nobody touched it. The post-mortem they hired us to write is the reason this story exists. Names and identifying details have been changed at the client's request.

What the agent thought it knew

The agent ran on a clean enough stack. It answered the chain's main line, handled the Dutch and English flows, captured insurance details, and wrote bookings straight into the central appointment system. The voice was good. The intent classifier was good. The hold music was good. Patients liked it.

Its single source of truth for which locations were active was a JSON file that synced overnight from the appointment system into the agent's RAG store. That file was the authoritative list of clinics, opening hours, services, and availability windows. Everything the agent said about a location came from that file.

{
  "clinics": [
    {
      "id": "clinic-zuid",
      "name": "Tandheelkunde Eindhoven Zuid",
      "address": "Voorbeeldstraat 12, Eindhoven",
      "state": "active",
      "hours": {"mon-fri": "08:00-18:00"}
    }
  ]
}

The sync job that wrote that file was a tiny cron task. It ran at 03:00, fetched the active-locations list from the appointment-system API, normalized it, and pushed it into the RAG bucket. The agent picked up the new file on its next cold start. None of this was exotic. Everything was wired together the way a competent vendor would wire it together.

The Pasen deploy that broke the chain

The chain's in-house IT lead had pushed a small change to the appointment-system API on the Friday before Pasen (Dutch Easter weekend, early April this year). A field name in the locations endpoint changed from status to state. Three characters. He posted the change in the in-house standup channel the following Tuesday. The voice-agent vendor was on a separate Slack and never saw it.

The sync job tried to fetch the active locations on Saturday morning, hit a KeyError on status, logged the trace to a file in the container's ephemeral filesystem, and exited.

That alone would have been recoverable. But the script had a fallback. If the new file was empty, the previous one stayed in place. "Defensive programming," the previous agency had written in a code comment. The intent was to survive a transient API outage. The effect was to freeze the agent's view of the world at whatever it knew on Good Friday.

That same Easter weekend, the south Eindhoven clinic locked its doors for the last time. The wind-down had been planned for months. The location record was flipped from active to inactive in the appointment system on Easter Saturday morning, right on schedule. The agent's RAG file, frozen on Friday night, still said active.

It still said active on the Tuesday in May, ten weeks later, when the first patient called from the locked door.

Warning

A voice agent with a "keep the last good file" fallback will lie to your customers with perfect confidence the moment its data source falls over. The fallback is not the bug. The missing alarm on the fallback is the bug.

Why nobody noticed for two months

This is the part that hurts. Every component in the chain had a healthy heartbeat. The voice agent was answering calls, and the dashboard said so. The appointment system was accepting writes, and the dashboard said so. The cron job was running every night, and the dashboard said so. The cron's logs were even green, because the wrapper script swallowed the inner Python error and exit-coded zero on the fallback path.

Three layers of "everything is fine, boss" sitting on top of a sync job that had not actually transferred a byte since April 3rd.

The thing that finally caught it was a patient standing on a doorstep, not a monitor. That is not a system. That is luck.

The Google SRE book's chapter on monitoring distributed systems states the failure mode plainly: a dashboard built on the wrong signal is worse than no dashboard, because it actively suppresses the instinct to check. That is exactly what happened here. The agent had three green lights and one stale file, and the green lights won.

The regulatory direction in the EU sharpens the cost. If your agent confidently tells a customer something untrue, the company running the agent wears the consequences. "The sync job failed" is not a defence anyone has to accept. That is reason enough to treat the data plumbing behind an agent with the same seriousness as the agent itself.

The forty-eight hour fix

We were called in on the Wednesday morning. By Thursday evening the chain was safe again. The fix was unglamorous.

First, we replaced the silent-fallback logic with a hard failure. If the locations sync cannot fetch the live list, the file is written with a top-level "stale": true flag and a timestamp. The agent reads that flag on startup. If the file is stale, the agent refuses to confirm any booking and routes the caller to a human. A refused booking is annoying. Forty-seven wrong-address bookings are a press release.

def write_locations(payload, dest):
    if not payload.get("clinics"):
        # never silently keep the previous file
        raise SyncFailed("empty payload, refusing to overwrite")
    payload["synced_at"] = utcnow().isoformat()
    payload["stale"] = False
    dest.write_text(json.dumps(payload, indent=2))

Second, we added a dead-man switch. The sync job pings a Healthchecks.io endpoint on a successful write. If the ping does not arrive within 26 hours, the chain's ops lead and the on-call engineer both get a push notification. Healthchecks.io is twenty euros a year. It would have caught this on the Saturday morning of Pasen weekend.

Third, we wrote a five-line canary booking test that runs every fifteen minutes. It calls the production voice agent, asks for the south Eindhoven location, and asserts that the agent either offers a real bookable slot or says the location is closed. If the agent confidently offers a slot at a clinic flagged inactive in the source database, the test pages us.

Fourth, and this is the one that should have existed from day one: we added a write-then-read verification step on the sync. After writing the locations file, the script re-reads it, parses it, and confirms that the set of active location IDs matches the live API's active set. Mismatch fails loud.

The pattern under the story

Voice agents (and chat agents, and email agents) have a failure mode that traditional CRUD software does not. When a CRUD app's data source goes stale, the app usually shows a blank screen or a 500 error. When a voice agent's data source goes stale, the agent makes up confident, fluent sentences from the last thing it knew. The user has no way to tell the difference.

This is the single most important thing to internalize when you ship an agent into production. The agent is not the risk. The agent's data plumbing is the risk. And the data plumbing has to fail loud, every layer, every time. No defensive fallbacks. No swallowed exceptions. No green dashboards sitting on top of dead jobs.

Takeaway

An AI agent is only ever as truthful as the sync job behind it. Every production agent needs a dead-man switch on its data sources and a canary that asks the agent a question with a known answer, every fifteen minutes, forever.

The five-minute audit you can do this afternoon

If you run a voice or chat agent in production, here is the homework. Open the cron logs for whatever job feeds the agent's knowledge. Look at the last successful write timestamp. If it is more than 36 hours old, you have the same bug. Look at whether the writer has a "keep previous file on empty payload" branch. If it does, that branch has no alarm, and you have the same bug in waiting. Look at whether anyone, anywhere, would actually get a notification if the sync stopped running tonight. If the answer is "the dashboard would go yellow," that is not an answer.

When we rebuilt the voice agent for the dental chain, the bit that took the longest was not the voice, the prompts, or the booking flow. It was the boring observability scaffolding underneath and the gating logic that makes the agent refuse to invent answers when the ground truth is uncertain. That is the work behind every AI agent we ship.

Pick one production agent today. Find its sync job. Add a dead-man switch and a canary. That's the whole homework.

Key takeaway

An AI agent is only ever as truthful as the sync job behind it. Build a dead-man switch and a canary on its data sources before you ship the voice.

FAQ

How did a voice agent confirm 47 bookings to a closed clinic without anyone noticing?

The sync job that fed the agent's knowledge file failed silently after an API field rename. A defensive fallback kept the last good file in place, so the agent kept reading a frozen snapshot for ten weeks.

What is a dead-man switch for a voice agent?

An external check that expects a heartbeat ping every N hours. If the ping stops, the check pages a human. Tools like Healthchecks.io do this for about twenty euros a year.

Why is a silent fallback considered the bug rather than the sync failure?

Sync jobs will fail occasionally. That is normal. The bug is silently keeping stale data in place without alarming anyone, because nothing downstream can tell the difference between fresh truth and frozen truth.

How often should a voice agent be tested against its ground truth?

Every fifteen minutes in production. A simple canary that asks the agent a question with a known correct answer and pages on mismatch will catch most data-source drift before customers do.

voice agentsai agentsoperationsarchitecturecase studyautomation

Building something?

Start a project