Chat agents
Vet triage chat agent: a 9-clinic Nijmegen playbook
A panicked owner messages on Saturday morning. The reception desk is buried. A chat agent now triages, books an emergency slot, and hands off to a nurse in 40 seconds.

It's 07:48 on a Saturday. A Border Collie on the Nieuwe Marktstraat has eaten something it should not have. The owner is typing into WhatsApp with one hand and holding the dog with the other. The reception desk at the central clinic has nine more chats stacked above hers, the on-call vet has not yet finished her coffee, and one of the three receptionists logged into WhatsApp Business is on the phone with a different owner about a different dog. This is the scene that broke the queue.
Twenty-eight people work in the practice. Nine clinics across greater Nijmegen, from the city centre out to Wijchen and Beuningen. One shared WhatsApp number routed across receptionists who also answer the phone, check pets in, and run the till. By Saturday morning the queue depth was the kind of number nobody wanted to look at. The owners weren't angry yet. They were going to be.
This is the playbook for what we built to replace it. A chat agent that triages the inbound message, books an emergency slot across nine separate Animana calendars, and hands off to a vet nurse inside 40 seconds. The agent is not a vet. It does not pretend to be a vet. The whole point is that it gets out of the way faster than a human ever could.
The triage protocol that does not pretend to be a vet
Before any code, we sat with the head nurse for two days and watched her work. The most important thing we learned: she never tells an owner what is wrong with the animal in chat. She classifies how urgently the case needs to be seen, books the slot, and gets them off the screen and into the building. The agent we built mirrors that exactly.
Four tiers, mapped one-to-one to the protocol the practice already used on the phone:
- Red: immediate. Bleeding that won't stop, seizure, suspected gastric torsion, hit by car, dystocia, suspected poisoning, breathing difficulty. The nurse is paged before the second question is asked.
- Orange: same-day emergency. Suspected fracture, vomiting over twelve hours, lethargy with no appetite for a day, eye injury without active bleeding.
- Yellow: within 48 hours. Worrying but not acute.
- Green: routine. Vaccinations, repeat prescriptions, check-ups.
The agent's output is a structured object, validated before anything else happens. We use Pydantic on the server side, and any model response that does not parse is retried once and then escalated to a human:
class TriageDecision(BaseModel):
tier: Literal["red", "orange", "yellow", "green"]
confidence: float # 0.0 to 1.0
flags: list[str] # e.g. ["bleeding", "ingested_foreign_object"]
clinic_preference: str | None # branch from owner profile if known
needs_equipment: list[str] # ["xray", "surgery", "ultrasound"]
rationale: str # logged, never shown to the owner
requires_human_within_s: int # red = 0, orange = 60, yellow = 600
The rationale field is the audit trail. Every triage decision is logged with the model's reasoning so the practice manager can review them on Monday morning. We bias the classifier toward over-triage. If the model is unsure between orange and red, it picks red. A false positive costs a nurse two minutes. A false negative costs a dog.
The calendar arbitration layer
Nine clinics, nine separate calendars in Animana, the practice management system IDEXX sells to most Dutch and Belgian clinics. Each clinic has different opening hours, different equipment, different specialisms. The Wijchen branch has the surgical suite. The downtown branch is feline-only on Wednesdays. The Beuningen branch has the only ultrasound after 18:00.
The slot finder runs after the triage decision lands. It does not negotiate with the model. The model has already told us what equipment the case needs and what the owner's preferred branch is. The arbiter just queries Animana and sorts:
def find_emergency_slot(triage: TriageDecision, owner_postcode: str) -> Slot | None:
horizon_min = {"red": 60, "orange": 240, "yellow": 2880}[triage.tier]
candidates = animana.list_open_slots(
from_ts=now_amsterdam(),
within_minutes=horizon_min,
clinics=eligible_clinics(triage.needs_equipment),
)
if not candidates:
return None
candidates.sort(key=lambda s: (
drive_minutes(owner_postcode, s.clinic.postcode),
s.start_ts,
))
return candidates[0]
The slot is reserved by the agent under a service account, then handed to the owner as a plain-text offer. Plain text, always. Early on we sent slot offers as WhatsApp interactive buttons. Owners on older KaiOS-based phones could not tap them. Now the buttons are an enhancement on top of a copy of the same offer written in clear Dutch. If the buttons render, fine. If they don't, the owner replies "ja" and we parse that. The WhatsApp Cloud API docs document both surfaces; we use both, but the text is canonical.
The 40-second handoff budget
The whole game is the budget. Forty seconds, from the first inbound message in red tier to a human nurse typing on the other end. The breakdown looks like this in the happy path:
- 0s: message arrives. Agent acknowledges in Dutch within one second. The acknowledgement is intentionally not warm in red tier. No "wat vervelend!". Just "Ik help je nu meteen. Eerste vraag:".
- 1 to 12s: two structured questions, never more than two before a tier decision lands. The model is allowed to skip the second question if the first answer is already a red flag.
- 12 to 14s: structured output validated, slot finder runs against Animana.
- 14 to 18s: slot offered to the owner with the address and the time. Owner confirms.
- 18 to 32s: nurse paged. We page two ways in parallel: a Slack ping with the chat transcript and a phone notification through Animana's existing alerting. Whoever acks first owns the conversation.
- 32 to 40s: nurse opens WhatsApp Business, agent posts a one-line briefing into the same thread for her to read, then steps into observe mode.
Your triage agent's success metric is not conversion or sentiment. It is handoff latency in the most urgent tier. Pick that number first and design backward.
Failure modes we hardened against
This is the section that gets cut from most "we built an AI agent" write-ups. It is the only section that matters in production.
Owners send photos
About one message in seven contains an image. We do not run a vision model on the image to classify the wound. We do not run a vision model on the image at all. The image is forwarded into the same thread for the nurse to see, and the agent acknowledges receipt in plain text. The day we let a vision model give medical-shaped commentary on a photo of a dog's eye is the day we get sued. The line is hard and the line stays.
Owners ask for advice
Refuse-by-default, but refuse warmly. The system prompt forbids any sentence that diagnoses, recommends a treatment, or estimates severity. When an owner asks "denk je dat het ernstig is?" the agent replies: "Dat kan onze verpleegkundige veel beter beoordelen. Ik haal haar er nu bij." Cold, useful, honest.
Jailbreaks
Owners under stress do not try to jailbreak the agent. Bored people do. We treat any chat thread containing role-play prompts, system-prompt extraction attempts, or instructions to "ignore the above" as malicious and route it to a human with a flag. Public incidents involving in-product chatbots being abused to perform unintended actions are a useful reminder of what happens when a chatbot has write capabilities and no narrowing on what it is allowed to do. Our agent has exactly one write capability: book a slot whose Animana ID was just returned by the calendar service in the same request cycle. That is the entire attack surface.
Model hallucinates a slot
The agent is structurally prevented from making up a time. It can only offer slots whose Animana IDs the calendar service returned in the same turn. The slot ID is checked round-trip when the owner confirms. If the model proposes "morgen om 10:00" without an underlying ID, the offer is suppressed and the agent asks the owner to wait for a nurse.
Language
The practice serves a mostly Dutch-speaking catchment, but Nijmegen has a large international student and expat population. We support Dutch, English, German, and Polish. Anything else triggers immediate human handoff with the original message preserved verbatim in the thread.
If your chat agent's only contact with the medical system is a screenshot of the conversation in a shared Slack channel, you do not have an agent. You have a liability. Wire the audit log into the practice management system from day one.
Things we got wrong
Three of them, all in the first month.
The friendly tone in red tier. Version one of the agent opened every conversation with "Wat vervelend om dit op een zaterdag mee te maken!". Owners with bleeding animals hated it. We removed warmth from the red-tier opener entirely. Yellow and green tier keep it. Tone is not a global setting, it is per-tier.
Out-of-hours handoff. We assumed weekend shifts had reliable coverage. They mostly did. Sunday nights after 22:00, not always. The fix was deterministic: if no nurse acknowledges within 90 seconds in red tier, the agent dials the on-call vet's personal mobile through Twilio with a synthesised summary. We tested this six times in week three. It worked five times. The sixth time the vet had her phone on Do Not Disturb. We added a second number.
The reservation expiry. Animana releases a held slot if it is not confirmed within four minutes. Our agent could hold a slot, the nurse could be slow to acknowledge, and the slot would silently fall back into the public pool. We wrote a thin reservation layer outside Animana that holds the slot independently for eight minutes and reconciles every sixty seconds. It is not glamorous code. It saved the launch.
What the practice manager actually changed
The agent did not reduce the headcount. The practice did not want it to. What it did was move three receptionists out of WhatsApp triage and back to the front desk, where there were people standing with carriers waiting to be checked in. The agent handled the new emergency intake. The receptionists handled the humans who were already inside the building. Both queues shrank.
The numbers we measure live on a dashboard the practice manager opens with her morning coffee. Median handoff time in red tier. Count of orange-tier slots that became red mid-conversation (it happens, more often than you'd guess). Count of refusals. Count of human escalations on jailbreak flags. The dashboard is boring. Boring is good.
The thing to do today
When we built the triage agent for this Nijmegen group, the part that kept biting us was the slot reservation race. We solved it with the thin reservation layer described above, plus a reconciliation job that runs every minute. If you are looking at AI agents for a process that talks to humans under stress, the boring infrastructure (locks, reconciliation, audit logs, deterministic escalation) matters more than the model choice.
Open your existing queue today. Pick one conversation thread, any one. Time it end to end with a stopwatch: first inbound message until the customer has what they came for. That number is your real benchmark. Everything else is opinion.
Key takeaway
Your triage agent's success metric isn't conversion or sentiment, it's handoff latency in the most urgent tier.
FAQ
Does the chat agent give medical advice?
No. It classifies urgency, books a slot, and hands off to a vet nurse. It explicitly refuses medical questions and routes them to a human within one second in red tier.
How does it pick which of the nine clinics gets the booking?
By urgency tier, equipment needed (X-ray for fractures, ultrasound for abdominal cases), owner postcode for travel time, and the soonest qualifying slot. Animana is the source of truth.
What stops the model from inventing a slot time?
The agent can only offer slots whose Animana IDs the calendar service returned in the same turn. Slot IDs are checked round-trip on owner confirmation. The model never proposes a free-text time.
How do you handle jailbreak attempts?
A separate classifier flags role-play, prompt-extraction, and 'ignore the above' patterns. Flagged threads route to a human with the original transcript. The agent's only write capability is booking a pre-existing slot.