← Blog

Email automation

Pharmacy email agent: triaging 2,340 refill requests weekly

At 08:00 in an Arnhem online pharmacy, 312 emails are already waiting for the dispensing counter to open. The PostNL cut-off is 11:30. Here is what we built.

Jacob Molkenboer· Founder · A Brand New Company· 1 Oct 2025· 10 min
Cream envelope with green ribbon, brass tray with glass vials, paper slip under clip, folded green cloth on ivory desk.

Eight in the morning, Arnhem. The dispensing counter opens at nine. The night shift at the wholesaler ran from 04:00. Between those two clocks sits an inbox with 312 unread messages, almost all of them asking the same question in slightly different ways: Kan mijn herhaalrecept vandaag de deur uit?

That was the daily reality at a 21-person online pharmacy east of the river before we built the email agent. By the time the apotheker had triaged the queue manually, half of the dispatches missed the 11:30 PostNL cut-off. The team had grown from six to twenty-one in three years. The Pharmacom AIS underneath them was fifteen years old. Nobody had time to migrate it, and nobody wanted to be the person who broke a system that had touched 1.4 million prescriptions.

What was actually in the queue

Before we touched any code, we read 2,340 messages. One week's worth. We sorted them by what the writer actually needed:

  • 1,612 herhaalrecept-vragen with a clean AIS match (patient ID, medicine, last issue date all line up)
  • 409 herhaalrecept-vragen where something was off: changed dosage, expired voorschrift, the GP had not yet pushed the herhaling through LSP
  • 211 questions about levertijd because CB Online showed a stock gap on a UR-geneesmiddel
  • 74 messages asking for a different brand of the same werkzame stof
  • 34 messages with what we ended up calling BIG-twijfel: pickups via power of attorney, an unknown voorschrijver, or a BIG-nummer that did not resolve cleanly

Those 34 cannot be automated. Not now, not next year. They are the reason the apotheker is on the floor. The other 2,306 looked tractable.

The shape of the integration

Pharmacom does not have a REST API the way a 2025 SaaS does. It exposes an HL7v2 interface on a fixed internal port, plus a flat-file export the AIS writes to a shared volume every fifteen minutes. We picked the flat file. HL7 would have been faster and prettier, but it also would have required a vendor change request, and the pharmacy had been waiting eleven months on an unrelated change request already. The vendor (PharmaPartners) is responsive, but the queue is the queue.

CB Online is a different animal. The stock feed comes in as a nightly CSV plus a small JSON delta endpoint the wholesaler exposes to verified accounts. Refresh cadence is roughly every twenty minutes during business hours. We polled the delta, not the full CSV, and kept a local SQLite mirror so the agent never blocked on a network call.

incoming-mail (IMAP, info@apotheek-...)
  -> classifier (LLM, prompt v3)
  -> AIS lookup (Pharmacom flat file, patient match)
  -> stock check (CB Online SQLite mirror)
  -> decision tree
       -> auto-reply (SMTP relay)           [85% of traffic]
       -> apotheker queue (webhook -> board) [15%]

That is the whole shape. Two reads, one write, one human queue.

Where the agent draws the line

The triage prompt is short. It tells the model three things, in this order: who is writing (matched patient or unknown), what they are asking about (refill, stock, complaint, brand swap, other), and whether any of seven hard signals are present. The seven signals are not negotiable. Any of them flips the message into the apotheker queue, and the model is instructed never to draft a reply.

  1. Werkzame stof on the BIG-1 watchlist (opioids, benzodiazepines, ADHD-medicatie)
  2. Voorschrijver's BIG-nummer does not resolve in our CIBG mirror
  3. Dosage in the message differs from the last AIS issue
  4. Patient writes about a bijwerking or a contra-indicatie
  5. Power-of-attorney language (a third party collecting on someone's behalf)
  6. Message is in a language other than Dutch or English
  7. The patient's last AIS contact was more than 90 days ago
Warning

A model that decides to bypass a hard signal once will do it again. We do not let the LLM judge severity. The signals are evaluated with deterministic code, and only messages that pass all seven gates ever reach the drafting prompt.

The separation is the part of the system the apotheker checks every Friday afternoon. She reads ten random auto-replies and ten random queued items. If anything in the queue could have been auto-replied, or anything in the auto-reply set should have been queued, we tighten the gates. Six tightening cycles in eight weeks, then it settled.

The drafting prompt, in plain text

People ask what the prompt looks like. It is shorter than they expect. The interesting work is not in the prompt; it is in the structured input we hand it.

You are drafting a single reply on behalf of {pharmacy_name}.

Facts you may use (do not invent any others):
- Patient: {patient_first_name}, last issue {last_issue_date}
- Medicine requested: {medicine_name} {strength}
- Last issued dosage: {last_dose}
- Current stock: {stock_status}
- Earliest dispatch: {dispatch_eta}
- House signature: {signature_block}

Write in Dutch unless the original message was in English.
Do not promise a delivery date later than {dispatch_eta}.
Do not mention price.
Do not give medical advice.
Close with the house signature, exactly as provided.

The model never sees the BIG-nummer, the stock SKU, or the full AIS row. It sees what a junior assistant would see on a printed slip. That keeps the surface area for hallucination small.

The dispatch confirmation moment

An auto-reply is not the end of the chain. The patient still expects a dispatch confirmation when the parcel actually leaves the pharmacy. In the old workflow this was a manual mail merge that one of the team ran twice a day after the PostNL pickup. The agent now writes the confirmation, but it does not send it. Pharmacom writes a row to its dispensing log when the medicine leaves the building. A small worker tails that log and only then triggers the SMTP relay.

That order matters. Sending the confirmation before the parcel exists is the single complaint that lands a pharmacy in front of the IGJ. So we wired the SMTP send to a real, physical event: the AIS log line that says the medicine left the cabinet.

def on_dispense_logged(row):
    if row["state"] != "DISPATCHED":
        return
    draft = redis.get(f"reply-draft:{row['rx_id']}")
    if not draft:
        return  # apotheker handled this one manually
    smtp.send(
        to=row["patient_email"],
        subject=f"Verzonden: {row['medicine']}",
        body=draft,
        from_addr="info@apotheek-...nl",
        reply_to="info@apotheek-...nl",
    )
    redis.delete(f"reply-draft:{row['rx_id']}")

Numbers after eight weeks

We do not quote percentages without the absolute numbers. So:

  • 2,340 weekly emails in scope (the herhaalrecept and stock-question slice, not the whole pharmacy inbox)
  • 1,989 auto-replied within 12 minutes of receipt, median 4 minutes
  • 351 routed to the apotheker queue, of which 34 were the hard BIG-twijfel cases the system was designed to surface
  • 0 dispatch confirmations sent for parcels that had not actually left the building (verified against the PostNL scan log)
  • 11:30 PostNL cut-off met on 19 of 20 working days during the pilot

The freed time is the line that matters to the founder. The team used to spend 14 hours a week, distributed across three assistants, reading and triaging this inbox. They now spend 2 hours: 90 minutes reviewing the queue, 30 minutes auditing a sample of the auto-replies. Twelve hours back, every week, for a 21-person team. That is the only ROI sentence in the whole project document.

The bits we got wrong first

Three things we would skip if we did it again.

We started with a local model on a Mac Studio in the back office, because the data-sensitivity argument seemed obvious. We were wrong about the bottleneck. The model was not the slow part; the flat-file polling was. We moved the classifier to a hosted API and pinned all PII redaction to a local pre-processor, which is the architecture we should have shipped on day one. The honest answer to the local-versus-hosted question is "depends on the bottleneck," and the bottleneck is rarely the model.

We tried to make the agent draft brand-swap replies. That category looks innocent and is not. A request for "a different brand of the same werkzame stof" often hides an allergy or a preferentiebeleid issue with the verzekeraar, both of which the apotheker has to see. We pulled that draft path after three weeks.

We assumed BIG-nummers were stable identifiers. They are, until a voorschrijver retires or moves practice and the CIBG entry lags. We now refresh the CIBG mirror weekly and surface any unresolved nummer to the queue rather than guessing. The BIG-register is the source of truth; we mirror it locally to avoid hammering their endpoint, and we cross-check ambiguous cases against the KNMP guidance the apotheker already relies on.

The architectural takeaway

The model is the cheapest part of a pharmacy email agent. The expensive parts are the seven non-negotiable gates, the dispatch-event trigger, and the weekly apotheker audit. Build those first.

If you take one diagram from this post, take this one. The LLM sits in the middle of the stack, not at the edges. The edges are deterministic: the IMAP listener, the AIS reader, the stock mirror, the gate evaluator, the SMTP relay, the dispense-event listener. The model only ever drafts. The drafts only ever send when a physical event in the building says they should.

That matters for any regulated industry, not just pharmacies. The auditor needs to follow a paper trail from the dispatched parcel back to the original message, and forward to the sent confirmation. A model in the middle of a deterministic loop gives you that trail. A model at the edges does not.

What to do this afternoon

If you run a 10-to-50-person operation with a heavy inbox and a legacy line-of-business system, the smallest useful thing you can do today is read one week of your own mail with a highlighter. Mark every message you wish a junior could have answered without you, and every message you would want to see personally. The ratio between those two piles is the only data point you need to decide whether an email agent is worth scoping. We have done that exercise with everything from a Pharmacom AIS to a 2009 Joomla webshop running a custom invoicing module, and the ratio is almost never what the founder predicts.

When we built this email agent for the Arnhem pharmacy, the thing we underestimated was the audit loop. We ended up giving the apotheker a one-screen dashboard that shows ten random replies and ten random queued items every Friday, and that single screen is what kept the rollout safe. The same pattern fits invoice-chasing, knowledge-base, and onboarding inboxes; our AI agents work walks through the other live deployments where we used it.

Key takeaway

The model is the cheapest part of a pharmacy email agent. The deterministic gates, the dispatch-event trigger, and the weekly apotheker audit are where the work is.

FAQ

Why use a flat-file export instead of HL7v2?

HL7 would have been faster and cleaner, but it required a vendor change request that was already eleven months deep in a queue. The flat file landed on a shared volume every fifteen minutes and was good enough to ship.

How do you stop the model from auto-replying to risky cases?

Seven deterministic gates run before the drafting prompt. If any gate fires (BIG-1 watchlist, dosage mismatch, unresolved BIG-nummer, side-effect language, etc.) the message goes to the apotheker queue and the model never sees it.

Why wait for an AIS event before sending the dispatch confirmation?

Confirming a dispatch that has not happened is the single complaint that brings the IGJ to the door. We tail the Pharmacom dispensing log and only release the pre-written confirmation when the medicine actually leaves the cabinet.

How many hours per week did the agent free up?

Twelve. Three assistants used to spend roughly fourteen hours a week on the herhaalrecept inbox. After eight weeks the team spent two hours: ninety minutes on the queue and thirty on the Friday audit sample.

Did you run the model locally for data residency?

We tried, then switched to a hosted API behind a local PII redaction step. The model was never the bottleneck; the flat-file polling was. Residency was solved at the redaction layer, not the model layer.

ai agentsemail automationcase studyintegrationsoperationsworkflow

Building something?

Start a project