Email automation
Invoice chase to twenty minutes: one agent, one outbox
The accounts inbox at an Amsterdam freight broker used to swallow a full afternoon every Friday. One email agent and a Postgres outbox now do it in twenty minutes.

Friday, 16:00. Marleen has the invoice spreadsheet open on one screen and Exact Online on the other. Forty-eight invoices past due. Three currencies. A note in column G reads wait, dit is de andere BV next to a customer with two legal entities. She will not leave the desk before seven.
That was the picture at a small Amsterdam freight broker in February. Their accounts person was burning roughly six hours a week chasing money that was already owed. By April, the same job took twenty minutes of human attention, and most weeks she did not touch it at all.
The work is now done by a small email agent and a Postgres outbox. Neither piece is clever on its own. The combination, with the right escape hatches, is what made the project boring enough to ship.
Receivables before we changed anything
The broker handles roughly 280 invoices a month, in EUR, USD, and the occasional GBP. Net 30 is the default, but freight forwarders live with messier terms: some clients pay on customs clearance, some on proof of delivery, some on a calendar day every month. About 15% of invoices need a polite nudge inside the first 30 days, another 8% need a real chase between 30 and 60, and a tail of roughly 3% spills past 60 days and becomes a phone call.
Marleen’s process was the spreadsheet, a saved-replies list in Gmail, and a memory of which customers respond to which tone. Effective, mostly. Slow, definitely. And brittle: when she was on holiday, nothing moved.
The case against buying a chasing SaaS
We looked. Chaser, Upflow, Numeral, a handful of newer entrants. They each solve a real problem for a typical SaaS company that ships a tidy monthly recurring bill in one currency. The broker has none of those properties:
- Three currencies, with FX-sensitive rounding in their ERP.
- Mixed Dutch and English correspondence, frequently inside the same thread.
- Two legal entities under one trade name, billing different customers.
- A handful of “do not chase, ever, ask me first” relationships.
Any tool we looked at could be bent toward this shape with enough mapping and exception rules. By the time we had bent it, we owned a worse version of the same code we would have written ourselves, plus a per-seat bill. So we wrote it.
The shape of the agent
Once an hour, a small Node worker pulls open invoices from the broker’s Exact Online tenant. Each invoice runs through a short pipeline:
- Is this customer on the do-not-chase list? Skip.
- Has the invoice been chased recently? Read the chase ledger.
- Decide the action: nothing, gentle nudge, firm reminder, final notice, or escalate to human.
- If the action is anything other than nothing, generate the email and write it to the outbox.
The decision step is mostly deterministic. Days overdue, customer payment history, currency, and language preference produce a tier. The language model writes the email body using the tier as a constraint, plus a short style guide we keep in a markdown file. We do not ask the model to decide whether to send.
That split matters. The model writes; rules decide. If the model picks the wrong tone, we can read the email in the outbox before it ships. If the model hallucinates an invoice number, the outbox row carries the real invoice ID and the worker rejects any send where they disagree.
The case for a Postgres outbox
The naive version of this looks tempting: the agent decides to chase, calls Postmark, marks the invoice chased. One step. No moving parts. We did not build it that way, for the same reason most production systems do not: dual writes lie.
If the SMTP call succeeds but the database write fails, the customer gets two chases. If the database write succeeds but the SMTP call fails, the customer gets none and we think we sent one. The pattern that solves this is the transactional outbox: write the intent to a row in the same database transaction as the state change, and let a separate worker turn that row into an actual side effect.
Schema, roughly:
create table email_outbox (
id bigserial primary key,
invoice_id text not null,
to_address text not null,
cc_addresses text[] not null default '{}',
subject text not null,
body_text text not null,
body_html text not null,
language text not null check (language in ('nl','en')),
tier text not null check (tier in ('gentle','firm','final','human')),
dedupe_key text not null unique,
status text not null default 'pending'
check (status in ('pending','holding','sent','failed','cancelled')),
scheduled_at timestamptz not null,
sent_at timestamptz,
provider_id text,
attempts int not null default 0,
last_error text,
created_at timestamptz not null default now()
);
create index email_outbox_ready
on email_outbox (scheduled_at)
where status = 'pending';
The dedupe_key is sha256(invoice_id || tier || iso_week). A second run inside the same week, on the same invoice, at the same tier, is a no-op insert. The agent can be retried, restarted, or accidentally run twice and the customer still sees one email.
The worker is dull on purpose. It picks up rows where status = 'pending' and scheduled_at <= now(), sends through Postmark, writes the provider message id back, and marks the row sent. On failure it bumps attempts, schedules a backoff, and surfaces the row in a daily Slack digest after three tries.
The chasing decision, in code that fits on a screen
The actual decision function is unromantic, which is the point. Most of the logic is in the rules; the language model only writes prose.
type Tier = 'none' | 'gentle' | 'firm' | 'final' | 'human';
export function decideTier(inv: Invoice, cust: Customer, today: Date): Tier {
if (cust.doNotChase) return 'none';
if (inv.inDispute) return 'human';
const daysOver = diffDays(today, inv.dueDate);
if (daysOver < 3) return 'none';
const recentChase = inv.lastChasedAt
&& diffDays(today, inv.lastChasedAt) < 7;
if (recentChase) return 'none';
if (daysOver < 14) return 'gentle';
if (daysOver < 30) return 'firm';
if (daysOver < 60) return cust.badPayer ? 'final' : 'firm';
return 'human';
}
The model gets the tier, the customer name, the invoice number and amount, the due date, the preferred language, and a short style guide (“warm but direct, never apologise for chasing, never threaten, no idioms”). It returns a subject and a body. We render the email, store both plain text and HTML in the outbox row, and move on.
We do not ask the model to choose recipients, dates, or amounts. Those come from Exact Online. If the model invents a number, the worker compares it against the structured invoice fields and refuses to send. Most of the model’s failures are caught by that one check.
Replies and the “I will pay Tuesday” pile
Sending is the easy half. Replies are the half that actually saves time.
A second worker pulls the shared receivables mailbox over IMAP every five minutes. Each incoming message is matched to its outbox row by the Message-ID we stored on send, threaded back to the invoice, and classified into one of five buckets:
- Payment promised (with a date, if we can find one).
- Already paid (with a date or reference, if quoted).
- Asking for a copy of the invoice or a statement.
- Dispute or query.
- Out of office and other noise.
The first three resolve themselves. “Already paid” rows get checked against the ERP’s bank reconciliation and either cleared or escalated. “Promise to pay” rows get a follow-up scheduled for the promised date plus two days. “Copy please” rows trigger a one-shot outbox row with the PDF attached.
Disputes and queries surface in a small Slack channel with the original invoice, the reply, and a one-sentence summary. Marleen picks them up in a few minutes a day. That is most of where her twenty minutes a week now goes.
The kill switch as the whole product
We added a status = 'holding' value to the outbox before we wired up sending. Any operator can flip a single row, or every row for a customer, into holding. The worker ignores holding rows. There is also a global pause: a single row in a feature_flags table that the worker checks each loop. Flip it, nothing sends. We have used it twice in four months: once for a tax-year cutover, once because a customer was acquired and we wanted to pause until the new AP contact was confirmed.
This part is unglamorous and probably the single most important design choice we made. It is also why the broker trusts the system. The interesting work is not the model. It is the surface around the model that lets a human stop it, audit it, and replay it.
If you cannot pause your agent from a single SQL update, you do not have an agent. You have an outage waiting for a quiet weekend.
Cost and savings
The build was three weeks of one engineer, plus two short calls with the broker’s finance lead to lock down tone and edge cases. We host the worker on a small Hetzner box that the broker already paid for. Postmark and the language model sit inside their existing budget for outbound mail and tooling; the model calls cost roughly EUR 6 a month at current volume.
The measurable change:
- Time on receivables: from about six hours a week to about twenty minutes.
- Median days sales outstanding: 41 down to 34 over the first full quarter.
- Invoices that aged past 60 days: down by roughly half, comparing Q4 last year to Q1 this year.
DSO is the number the broker’s bank cares about. The twenty minutes is the number Marleen cares about. Both moved.
Two things we would do differently
First, we would write the reply classifier before the sender, not after. We built sending first, watched replies pile up in the inbox for two weeks, and then scrambled to triage them. If we had started from the reply side, the design of the outbound emails would have been shaped by the answers we were going to get back, not by what felt natural to send.
Second, we would put the style guide in the same repository as the code from day one, not in a shared document. The model is only ever as well-behaved as the prompt it reads, and prompts that live outside version control drift.
One small audit to run this afternoon
Open your AR aging report. Find the oldest invoice that you could have chased two weeks ago and did not. Send that one email this afternoon. Then open a notebook and write down what you would need to know about a customer to decide which of four tones to use. That list is the spec for the agent you will eventually want.
When we built the email agent for the broker, the thing that took the most iteration was not the prose. It was the rules around when not to send. We ended up solving it by writing the kill switch first and the chase logic second.
Key takeaway
The model writes the email. Rules decide whether to send. A Postgres outbox makes the side effect safe to retry, audit, and pause from a single SQL update.
FAQ
What is a transactional outbox?
A database table that records the intent to send a message in the same transaction as the business change. A separate worker reads the table and performs the side effect, so the database and the outside world cannot disagree.
Why not let the model decide whether to send?
Sending is irreversible. Rules decide, the model writes the prose for a given tier. The split keeps mistakes auditable and recoverable, and lets a human review any outbox row before it ships.
How do you handle replies in mixed Dutch and English?
The classifier reads both, threads by Message-ID, and routes payment promises to a follow-up schedule and disputes to a human in Slack. Outbound tone matches the customer's preferred language.
What stops the agent from chasing twice on the same invoice?
A deterministic dedupe key built from invoice id, tier, and ISO week, with a unique constraint on the outbox table. A second insert is a no-op, so retries and restarts are safe.