Process automation
Bank reconciliation automation: 22 hours to 90 minutes
A Den Bosch bookkeeping practice was burning 22 hours a month on bank reconciliation. One OCR worker and a Postgres ledger cut it to 90 minutes.

The fluorescent lights at a four-person bookkeeping practice in Den Bosch were still on at half past ten on a Friday in March. Their senior accountant, who we'll call Maaike, was on row 814 of a spreadsheet that pulled every bank line for one of their bigger SME clients. She had been at it since after lunch. The job was the monthly reconciliation: matching every line on the bank statement to an invoice, a receipt, a payroll run, or a memo. The match rate hovered around 70 percent on a good month. The other 30 percent meant clicking into Twinfield, opening the corresponding PDF in a second window, and judging.
The practice told us they spent 22 hours per client per month on this, across their book. They had eleven clients on a monthly rhythm. That is 242 hours a month of skilled labour spent on what is, in honest terms, a fancy lookup.
When they asked us to help, we did not propose an "AI bookkeeper" or anything close. We built two small things: an OCR worker that reads incoming invoice PDFs, and a Postgres ledger that does the matching. Three months in, the reconciliation block per client is averaging 90 minutes. They have not lost a single match to a hallucination, because nothing in the booking pipeline hallucinates.
This is how we built it.
What the 22 hours actually were
Before any code, we sat next to Maaike for two afternoons. The 22 hours were not one task. They were six tasks stacked on top of each other, none of them individually painful:
- Importing the bank statement from the client portal as a CAMT.053 XML file or, for one stubborn Rabobank export, a CSV.
- Importing supplier invoice PDFs from a shared inbox.
- Reading the amount, date, and IBAN off each PDF (by eye).
- Looking up which bank line that invoice corresponded to.
- Booking the match in Twinfield with the right ledger code.
- Flagging the lines that did not match for the client to explain.
Steps 3 and 4 ate almost two-thirds of the hours. Step 6, chasing the leftovers, ate most of the remaining third. Step 5, the bit that actually requires an accountant, took less than 20 percent of the time. The skilled work was being crowded out by the lookup.
The architecture, on one page
We landed on a small enough stack that it fits in a paragraph. A Python worker watches an IMAP inbox and an SFTP folder. PDFs go to OCR; bank exports get parsed natively. Everything writes into one Postgres database with three core tables: bank_lines, documents, and matches. A nightly job runs the matcher. A small web view, served by FastAPI, shows the unmatched lines and lets Maaike resolve them in one click. Twinfield gets the booked entries through its SOAP API once a day.
No queue broker. No vector database. No language model in the hot path. Postgres does the matching with pg_trgm and a couple of generated columns. The whole thing runs on a 4 vCPU box at Hetzner that costs less than €30 a month.
The OCR worker
The first instinct, from anyone who has read a single AI newsletter this year, is to pipe every invoice through a vision model and ask it nicely for JSON. We tried that for a week. It worked 92 percent of the time. The 8 percent that failed failed in interesting and silent ways: a swapped digit on an amount, an invoice date read as the due date, a supplier name pulled from the letterhead instead of the legal entity at the bottom.
For accounting, silent 8 percent is unusable.
So we used Tesseract 5, with the Dutch and English language data packs, behind a layout-aware pre-processor. Most Dutch supplier invoices follow a small number of layouts, and the fields we care about (total, BTW number, IBAN, invoice number, invoice date) live in predictable positions. We wrote a small layout-matcher that, on first sight of a new supplier, lets Maaike draw four bounding boxes once. The next invoice from that supplier is read in 200 milliseconds with zero ambiguity. After a month of run-time, the practice had 47 supplier layouts in the database and the auto-rate on incoming invoices was 96 percent.
We do call a vision model, but only as a fallback when the layout-matcher refuses to commit, and only to suggest field positions for human review. The model never books anything directly. The Tesseract documentation still holds up well in 2026 if you give it clean inputs; most of our work was in the pre-processor (deskew, binarize, drop the watermark, isolate the largest table on the page).
def extract_fields(pdf_path: Path, supplier_id: int | None) -> InvoiceFields:
image = render_first_page(pdf_path, dpi=300)
image = deskew(image)
image = binarize(image)
if supplier_id and (layout := load_layout(supplier_id)):
return read_with_layout(image, layout)
# Fallback: queue for human review with vision-model suggestions
suggestion = suggest_field_positions(image)
enqueue_for_review(pdf_path, suggestion)
raise NeedsReview(pdf_path)
The Postgres ledger
The matching engine is the part most teams overthink. We did at first.
Each bank line has an amount, a date, a counterparty IBAN, and a free-text description. Each document (an invoice, a receipt, a payroll run) has the same four things, plus an invoice number. The match is mostly trivial:
CREATE TABLE bank_lines (
id bigserial PRIMARY KEY,
client_id int NOT NULL,
booked_on date NOT NULL,
amount_cents bigint NOT NULL,
counter_iban text,
description text,
description_norm text GENERATED ALWAYS AS
(lower(regexp_replace(description, '[^a-z0-9 ]', ' ', 'gi'))) STORED
);
CREATE INDEX idx_bank_amount_date ON bank_lines (client_id, amount_cents, booked_on);
CREATE INDEX idx_bank_desc_trgm ON bank_lines USING gin (description_norm gin_trgm_ops);
We use pg_trgm for fuzzy description matching, which has been in Postgres for over a decade and is more than enough here. The matcher runs in three passes:
- Exact: same amount, same IBAN, date within plus or minus 3 business days. About 71 percent of lines match here.
- Strong fuzzy: same amount, no IBAN match but description similarity above 0.55, date within plus or minus 7 days. Adds another 18 percent.
- Weak fuzzy: same amount, broader date window, description similarity above 0.30, capped at three candidates per bank line. These do not auto-book. They show up in the review queue with the suggestions ranked.
The auto-book threshold is set deliberately conservatively. Better to ask a human about 30 lines than to silently misbook three.
WITH candidates AS (
SELECT
d.id AS document_id,
b.id AS bank_line_id,
similarity(d.description_norm, b.description_norm) AS sim,
abs(d.issued_on - b.booked_on) AS day_gap
FROM bank_lines b
JOIN documents d
ON d.client_id = b.client_id
AND d.amount_cents = b.amount_cents
AND d.issued_on BETWEEN b.booked_on - 7 AND b.booked_on + 7
WHERE b.matched_at IS NULL
AND d.matched_at IS NULL
)
SELECT *
FROM candidates
WHERE sim > 0.55
ORDER BY sim DESC, day_gap ASC;
For CAMT.053, the Dutch banks have been on the ISO 20022 standard for years now, so the parser is one Python class that maps <Ntry> nodes onto rows. The one Rabobank CSV export still needed its own parser, because of course it did.
What broke first
Three things, in order.
The first surprise was partial payments. A client paid an invoice across two transfers because of a credit-limit constraint on their corporate card. Our matcher saw an amount mismatch on both lines and dropped them into the review queue. We added a "split-match" mode: if the sum of two bank lines from the same counterparty in a 5-day window matches a single document amount exactly, propose the split. Maaike still confirms it, but the suggestion is right 95 percent of the time.
The second surprise was rounded payroll. A salary line of €3,142.86 was being paid as €3,142.85 by one bank, because of a transient rounding bug in their export pipeline. We added a one-cent tolerance on the amount match. We are not proud of it, but it works.
The third was something we should have predicted. The shared inbox got CC'ed on an internal email thread that contained a PDF attachment of a screenshot of a different invoice. Our worker dutifully booked it. We now require an "invoice intent" gate: the sender domain must be on a known-supplier list, or the message subject must contain an invoice-shaped string, or a human has to drag the file into the "Definitely an invoice" folder.
Anything that watches an inbox will eventually try to book the lunch order someone forwarded. Build a positive intent gate, not a negative spam filter. Lists of what counts are easier to reason about than lists of what doesn't.
The numbers after three months
We are wary of round claims, so here are the actual figures from the practice's own time-tracking, averaged over April and May 2026 across their eleven monthly clients:
- Reconciliation hours per client: 22.3 (January-February baseline) down to 1.5.
- Auto-book rate on invoice-to-bank-line matches: 89 percent across the book, with the long tail being clients whose suppliers ship inconsistent invoice templates.
- Review-queue items per client per month: roughly 30, down from "everything".
- Time to onboard a new supplier layout: about 4 minutes, once.
The freed hours have not turned into layoffs. Maaike's practice has taken on three additional clients in the same period without adding headcount, and she now spends Friday afternoons doing advisory work that bills at four times the rate of reconciliation.
On not letting the model book
The Hacker News front page this week had a long thread about recursive self-improvement in code-writing agents. It is a fascinating research direction. It is also exactly the opposite of what you want in an accounting pipeline. Bookkeeping has a regulator. The regulator has expectations about who pressed which button. We made a deliberate choice that the model suggests and the human commits. The Postgres matcher is deterministic; the OCR is auditable; the only place a probabilistic system touches the books is in the review queue, where it is helping a person, not replacing one. That is also the only configuration the practice's compliance officer was willing to sign off on.
Where ABN comes in
When we built this reconciliation pipeline for the Den Bosch practice, the thing we ran into hardest was the silent-failure problem on vision-model OCR. We ended up solving it by collapsing the model's role to "suggest layouts on first sight, never book," which is the pattern we now reach for in any process automation where a regulator or an auditor sits at the end of the chain. The model is a research assistant. The deterministic system is the ledger of record.
If you have a 22-hour-a-month spreadsheet somewhere in your operation, the smallest useful thing you can do today is sit next to the person doing it for one afternoon and write down which sub-tasks actually eat the time. The answer is almost never the one the team would give you in a meeting.
Key takeaway
Lookup work doesn't need intelligence. A clean Postgres index and a deterministic matching rule beat a vision model that's right 92 percent of the time.
FAQ
Does this approach work for banks outside the Netherlands?
Yes, anywhere on the ISO 20022 standard, which covers most of Europe. CAMT.053 is the format. For US banks, you'd parse OFX or a similar bank export instead.
Why not use a single vision model end-to-end?
It fails silently. Our trials hit 92 percent accuracy, but the failures included swapped digits and wrong dates. For bookkeeping with a regulator at the end of the chain, that is unworkable.
How long does it take to onboard a new client?
About two days. One day to import historical bank exports and supplier invoices, one day for a human to draw bounding boxes on the first invoice from each recurring supplier.
Does this handle VAT (BTW) reconciliation?
Indirectly. The bank match resolves the cash side; VAT codes still come from the supplier and product mapping in the accounting package. The pipeline just makes sure the right invoice meets the right bank line.