← Blog

Email automation

Email agent for btw reminders: an accountancy playbook

At 7:42 on a Monday morning a fiscalist opens a queue of 240 draft btw reminders. Each one written by an agent overnight. Each one waiting for a click.

Jacob Molkenboer· Founder · A Brand New Company· 21 Jul 2025· 9 min
Cream envelope with chartreuse ribbon on dark green blotter, brass paperclip and folded receipt, ivory paper background.

It is 7:42 on a Monday morning in Arnhem. A fiscalist named Eline opens her queue: 240 draft emails, each one a btw-aangifte reminder for a different client, each one written by an agent overnight. She scans the first five, approves them in a batch, and starts on the next.

That queue used to be a spreadsheet, then a series of mail merges, then a junior associate working from 06:00 to 09:00 every Monday. Today it is software, and the firm sends 1,840 reminder threads a week without a fiscalist ever drafting one from scratch.

This is the playbook for how we got there. The firm runs Twinfield for bookkeeping and Visionplanner for advisory dashboards, a 19-year-old stack that does not ship with an email agent layer. The constraints were specific. No client-protected data was allowed to leave the building uncontrolled. No reply containing the word suppletie or naheffing was allowed to be auto-answered. Every outbound email had to be reviewable inside a fiscalist's inbox in under three seconds.

Here is what we built, where we got it wrong, and what you can take from it if you are sitting on a similar stack.

The stack we walked into

Twinfield has been the backbone of Dutch SME bookkeeping for two decades. It exposes a SOAP-era API documented in the Wolters Kluwer developer portal and a newer REST surface that covers about 60% of the SOAP endpoints. Visionplanner sits on top of Twinfield as a reporting and advisory layer.

The firm had three problems.

First, btw-aangifte reminders went out as a batched mail merge from a 2014 Word template, with no client-specific risk language. Every client got the same boilerplate. The fiscalists were getting roughly 80 confused replies a week ("but my Q4 was a refund?") that ate four hours of partner time.

Second, the firm had grown to 33 people but the partners still personally reviewed every reminder thread for any client invoicing over 250k EUR per quarter. That review work was the bottleneck, not the drafting.

Third, the existing Twinfield sync was a Python script on a Windows VM that fell over twice a month and required IT to restart it.

We were not hired to rebuild the bookkeeping system. We were hired to take the reminder pipeline off the partners' plates without introducing software that could one day send a wrong answer to a tax authority question.

Architecture in one diagram

The shape of what we built:

Twinfield (SOAP)  ->  sync worker (Node)  ->  Postgres (client + period state)
                                                       |
                                                       v
                                              draft generator
                                                       |
                                                       v
                            Cloudflare Worker (auth + redaction + routing)
                                                       |
                                                       v
                                               LLM (drafting only)
                                                       |
                                                       v
                                              fiscalist queue
                                                       |
                                                       v
                                              Microsoft 365 send

Two design decisions matter most.

The model never touches Twinfield or Visionplanner directly. It receives a pre-redacted JSON payload from a Cloudflare Worker proxy, which strips KvK numbers, bank accounts, and full names before any request leaves the firm's Cloudflare tenant. The proxy is a deliberate choke point. It logs every outbound token, enforces a model allowlist, and rejects payloads above 4kb (which would only happen if the upstream draft generator had a bug). If you have followed recent discussion about model provider availability and procurement risk, this is the layer that lets the firm swap providers in an afternoon without touching a line of application code.

The fiscalist queue is the only system allowed to send. The drafting layer cannot put a message on the SMTP wire. Even if the model produced a perfectly formed email, it would land in a Postgres row with status = 'queued_for_review', and a human would have to click before Microsoft 365 saw it.

Takeaway

For any accountancy email agent, the agent should never be the system of record for "is this safe to send." That role belongs to a human, a queue, and a database constraint.

The reminder loop

Every Sunday at 22:00 Amsterdam time, the sync worker pulls four things from Twinfield for every client on the firm's roster:

  1. The current btw period and its filing deadline.
  2. The aangifte status (concept, ingediend, voldaan).
  3. The provisional amount payable or refundable.
  4. The last three quarters of filing behaviour (on time, late, suppletie filed).

Those four signals go into a Postgres row. The draft generator wakes up at 23:00, walks the rows where a reminder is due in the next seven days, and constructs a payload per client:

{
  "client_handle": "c_18443",
  "deadline": "2026-06-30",
  "period_label": "Q2 2026",
  "amount_band": "10k_50k_refund",
  "history": {
    "last_three": ["on_time", "on_time", "late_3d"],
    "ever_suppletie": false
  },
  "tone": "neutral_professional",
  "language": "nl"
}

Notice the amount_band. We do not send the model the actual EUR figure. The model receives a band, and the final email template fills the precise number from Postgres at render time. If the model hallucinates an amount, the templating layer throws because the number it produced will not match the band token from the prompt. That single trick has caught two upstream bugs in production so far, both ours.

The model returns Dutch prose. We render it through a Mustache template that injects the real numbers, the real deadline, and the real client salutation, then writes the row to the fiscalist queue.

The suppletie and naheffing parking lot

This is the part that took the longest to get right.

A suppletie is a supplementary VAT declaration filed when an SME discovers an error in a previously filed aangifte. A naheffing is an additional tax assessment issued by the Belastingdienst, usually with interest and sometimes with a penalty. Both words are reply-killers. The moment a client uses either word, the conversation has moved from "reminder" to "we need a fiscalist."

We built a reply classifier that does two things:

const TRIPWIRE_TERMS = [
  'suppletie', 'suppleties',
  'naheffing', 'naheffingsaanslag',
  'boete', 'rente',
  'bezwaar', 'bezwaarschrift',
  'controle', 'boekenonderzoek',
];

function shouldPark(replyBody) {
  const haystack = replyBody.toLowerCase();
  return TRIPWIRE_TERMS.some(term => haystack.includes(term));
}

If shouldPark returns true, the reply is routed to a fiscalist queue with priority high, the client is marked in_human_conversation = true for the next 30 days, and the agent stops drafting any further reminders for that client until a partner clears the flag.

The classifier is intentionally dumb. We tried a model-based intent classifier first. It was 96% accurate. 96% accurate is not good enough when the 4% includes a client writing "ik heb een naheffing gekregen, wat nu?" and getting a friendly auto-reply about next quarter's deadline. A regex over twelve Dutch words is 100% recall on the things that matter, and the false positives (a client writing "geen naheffing dit kwartaal, gelukkig") just mean a fiscalist reads one extra email per week.

Warning

If you build an email agent for a regulated workflow, your safety layer has to be dumb enough that a partner can audit it in 30 seconds. A model classifier you cannot reason about is not a safety layer.

Twinfield and Visionplanner gotchas

Three things bit us.

Twinfield rate-limits at the office level, not the user level. The firm has 33 employees and one office. Running our sync worker in parallel with their existing Windows VM script meant we were eating each other's quota. We moved both jobs behind a single token bucket in our Node worker and cut throttle errors from roughly 200 a week to zero.

Visionplanner returns advisory snapshots, not transactional truth. For a reminder, you want the latest filed aangifte, not the latest analyst-massaged figure. We learned this the hard way after a fiscalist flagged a reminder that quoted an amount from the Visionplanner dashboard, which had been overridden by an analyst three days earlier. We now read amounts only from Twinfield's vatReturn endpoint and use Visionplanner exclusively for narrative context the reminder does not numerically depend on.

Twinfield session tokens expire after 60 minutes. The SOAP error code for an expired token is the same as for "client not found." We discovered this when the sync worker silently skipped 11 clients one Sunday. We now treat any Authentication failed response as a token refresh trigger and retry once before logging the client as missing.

What the fiscalist actually sees

The queue lives in a small internal web app. Three columns: subject line, client, send time. One panel: the full draft. Two buttons: approve, edit. A keyboard shortcut for approve. Eline can clear 240 drafts in 22 minutes on a good Monday, which is faster than the Word merge ever was, because she is reviewing language already tuned to the specific client rather than writing fresh.

Approvals stream back into the sync worker, which sends through the firm's Microsoft 365 tenant. Bounces, replies, and out-of-office responses come back through Microsoft Graph, get the regex tripwire run over them, and either route to a fiscalist or get logged as no_action.

The firm now sends 1,840 reminder threads a week. The partners have stopped reviewing client emails as a default behaviour. The four hours a week of confused-reply triage have dropped to about 35 minutes, almost all of it suppletie and naheffing conversations that should have been with a fiscalist all along.

What you can do today

If you run an accountancy email pipeline, do this before lunch. List the five Dutch words that, if a client wrote them, should stop your automation cold. Put them in a regex. Wire it to a Slack channel that one partner watches. You now have the cheapest, dumbest, most auditable safety net any email agent can sit behind, and you can build the agent later.

When we built this for the Arnhem firm, the hardest part was not the model layer. It was convincing the partners that a six-line regex was a more defensible compliance boundary than any model could be. That is the kind of work we end up doing on most email automation engagements: making the safety layer boring enough that a regulator could read it in a coffee break.

Key takeaway

In a regulated email workflow, the safety layer has to be dumb enough that a partner can audit it in 30 seconds. A model classifier you cannot reason about is not a safety layer.

FAQ

Why not use a model classifier for the suppletie tripwire?

Because 96% recall is unacceptable when the 4% miss is a client writing about a tax assessment and getting a cheerful reminder back. A regex over twelve words is 100% recall on the terms that matter and a partner can audit it in seconds.

Can the agent file the btw-aangifte itself?

No, and that was a deliberate constraint. The agent drafts reminder communication. Filing the aangifte stays inside Twinfield with a fiscalist's sign-off. Mixing drafting and filing in one system is the wrong place to take the regulatory risk.

What happens if Twinfield is unreachable on Sunday night?

The sync worker retries with exponential backoff for 90 minutes, then alerts on-call. The draft generator skips its 23:00 run. Reminders go out a day late at worst, which is fine because deadlines are weeks away, not hours.

Why route every model call through a Cloudflare Worker?

One choke point for redaction, logging, model allowlist, and provider routing. If a provider has an outage or a procurement issue, the firm changes one environment variable instead of touching application code, and every outbound token is already logged for compliance.

How big does an accountancy need to be for this to pay back?

Around 15 fee-earners and 400 active clients is the floor where the math starts working. Below that the partner-hours saved per week do not yet cover the engineering and the operational discipline the pipeline demands.

email automationai agentsautomationworkflowcase studyintegrations

Building something?

Start a project