Process automation

Retourzending reconciliation: the four-eyes SEPA playbook

A 26-person Breda fulfilment SaaS was bleeding refund hours every Friday. Here is the agent that closed the gap between WMS, Exact Online and the SEPA cutoff.

Jacob Molkenboer· Founder · A Brand New Company· 24 Nov 2025· 11 min

Ivory blotter with sealed cream envelope, green ribbon, brass paperclip on receipts, ledger stub, brass balance scale, wax fragment.

The clock above the finance desk in Breda reads 15:42. Mirella, the finance lead, has eighteen minutes before the Rabobank SEPA cutoff at 16:00. In front of her is a spreadsheet with 1,341 weekly retourzendingen waiting to be reconciled against Exact Online, an Excel pivot table that has not opened cleanly since Tuesday, and a Slack DM from the WMS lead asking why the warehouse shows 47 returns that finance has never seen.

This is a Dutch fulfilment SaaS. 26 people. They handle 6,800 returns a week for a portfolio of D2C brands. Their grootboek is a 12-year-old Exact Online tenant with the kind of crusted-over journal entries that show every accountant since 2014. The WMS is homegrown, written by the founder and a now-departed contractor, and it speaks a flat JSON schema that nobody documented.

The brief from the CFO was three sentences. Stop the Friday-evening scramble. Never let a refund over €120 leave without two pairs of eyes. Do not miss the 16:00 SEPA window.

What follows is the playbook we used.

The Friday afternoon shape of the problem

A retourzending in their system has four touchpoints. The WMS receives the parcel and scans the barcode. The WMS calls the order API of the originating webshop to mark the line as returned. Exact Online needs a credit-note journal entry against the original invoice. And the bank needs a SEPA pain.001 file with the refund payment.

Until we arrived, the first two happened automatically. The last two were a person.

That person was Mirella, plus a second finance hire who quit in March, plus a part-timer who only worked Mondays. They reconciled by exporting two CSVs, one from the WMS and one from Exact, pasting them into a shared workbook, running a VLOOKUP that broke whenever a customer used a slightly different email between purchase and return, and then keying refunds into the Rabobank zakelijk portal one at a time.

On a quiet week this took six hours. On a bad week it took fourteen, and refunds slipped to the following Tuesday.

Two systems, one truth

The reconciliation problem looks like a join. It is not. It is a reconciliation in the accountant's sense. Two ledgers that should agree, with a documented audit trail when they do not.

We modelled it as four states per return.

class ReturnState(str, Enum):
    WMS_ONLY = "wms_only"        # scanned in warehouse, no Exact match
    EXACT_ONLY = "exact_only"    # credit note exists, no scan
    MATCHED = "matched"          # both sides agree
    DISPUTED = "disputed"        # both sides exist, amounts differ

Every retour starts as WMS_ONLY when the warehouse scans the parcel. The agent runs every fifteen minutes against the Exact Online REST API, pulls the previous 24 hours of credit notes, and tries to promote rows out of WMS_ONLY into MATCHED.

The match is not on order number. It is on a composite key.

def match_key(record: dict) -> str:
    # Customers reuse emails, change addresses, and webshops generate
    # new order_ids on partial returns. Hash the tuple that actually
    # identifies a refund event, not the one the upstream chose.
    return hashlib.sha256(
        b"|".join([
            record["customer_email"].lower().strip().encode(),
            record["original_order_id"].encode(),
            f"{record['amount_cents']:08d}".encode(),
            record["product_sku"].encode(),
        ])
    ).hexdigest()[:16]

We tried matching on order_id alone for the first week. It missed 11 percent of returns because the webshops issue a new order_id on RMA, and Exact stores the original. We tried adding the customer's name. It missed another 4 percent because Exact title-cases everything and the WMS does not.

SKU plus original_order_id plus amount plus a normalised email is the floor at which the false-positive rate stayed under one in ten thousand for the eight weeks we measured.

Day one was not greenfield. The agent inherited eleven months of unreconciled returns sitting in the WMS with no matching credit note in Exact. We ran a separate one-shot backfill that batched 200 returns at a time, paused between batches to respect Exact's published rate limit of 60 requests per minute per app, and wrote every match into a backfill_audit table so the finance team could spot-check 50 random samples before we trusted the result. Three of those 50 samples surfaced a duplicate credit note from 2024 that nobody had noticed. The backfill paid for the build before the live agent had reconciled a single new return.

The €120 four-eyes queue

The CFO's hard rule was simple. No single human, including her, gets to release a refund over €120 without a second approver. This is the vier-ogen-principe as it shows up in everyday operations. A dual-control gate on anything material, the same principle the Basel Committee documents for operational risk in payments.

The shape we landed on was a queue with three lanes.

def route_refund(refund: Refund) -> Lane:
    if refund.state == ReturnState.DISPUTED:
        return Lane.MANUAL_INVESTIGATION
    if refund.amount_cents > 12_000:           # €120,00 in cents
        return Lane.FOUR_EYES
    if refund.fraud_score > 0.6:
        return Lane.FOUR_EYES
    return Lane.AUTO_RELEASE

Anything in AUTO_RELEASE flows straight into the next SEPA batch. Anything in FOUR_EYES sits in a Postgres table with an approved_by_a and approved_by_b column, both NULL until two distinct finance users sign off in the dashboard. We added a constraint.

alter table refunds_pending
  add constraint two_distinct_approvers
  check (approved_by_a is null
         or approved_by_b is null
         or approved_by_a <> approved_by_b);

It is a one-line check that has caught three attempts in production where Mirella, in a hurry, tried to use the part-timer's open session on the second screen. The database said no.

The dashboard the approvers see sits on a single page. The queue is sorted by amount descending, so the biggest refunds float to the top of the screen and the largest exposure gets the most attention. Each row carries the customer's last three returns from the previous 90 days, the WMS scan timestamp, the Exact credit-note ID, and any discrepancy between the two sides written out in plain Dutch. We stripped every other piece of chrome. The first version had a KPI sidebar with daily totals, week-over-week, batch size, and approval velocity. In week two Mirella told us the sidebar was the noisiest thing on her screen and she wanted the queue and nothing else. We deleted it the same afternoon. The dashboard has not gained a single new widget since.

Warning

Do not implement four-eyes by trusting your frontend to disable a button. The database is the only honest place for this check. Session hijacking, browser back buttons, and well-meaning shortcuts all fail at the application layer.

Cutting the SEPA batch before 16:00

Same-day SEPA Credit Transfer on the Rabobank zakelijk rail closes at 16:00 Amsterdam time for files submitted via the EBICS or H2H channel. Miss it by a minute and your batch settles the next business day, which means an angry customer email on Saturday morning.

Our agent generates a pain.001.001.03 XML at 15:50 every weekday. Ten minutes is a deliberate buffer. It runs three checks before it writes the file.

def assemble_batch(now: datetime) -> Optional[SepaBatch]:
    pending = db.query(Refund).filter(
        Refund.state == ReturnState.MATCHED,
        Refund.lane == Lane.AUTO_RELEASE,
    ).all()

    pending += db.query(Refund).filter(
        Refund.lane == Lane.FOUR_EYES,
        Refund.approved_by_a.isnot(None),
        Refund.approved_by_b.isnot(None),
    ).all()

    total_cents = sum(r.amount_cents for r in pending)
    if total_cents > DAILY_CEILING_CENTS:
        alert_cfo(f"Batch would exceed daily ceiling: EUR {total_cents/100:.2f}")
        return None

    if any(r.iban is None or not iban_valid(r.iban) for r in pending):
        quarantine_invalid_ibans(pending)
        pending = [r for r in pending if r.iban and iban_valid(r.iban)]

    return SepaBatch(
        msg_id=f"BREDA-{now:%Y%m%d-%H%M}",
        execution_date=now.date(),
        payments=pending,
    )

The daily ceiling is set at €45,000. The first week we shipped this, the ceiling triggered at €71,000 because of a single duplicated refund batch from a recovered webshop outage. The CFO got the alert at 15:51 and pulled the file. We dread to think what would have happened on a system without that check.

Takeaway

A ceiling alert that fires on the abnormal day is worth ten dashboards that look pretty on the normal day. Build the brake before you build the speedometer.

What broke first

Four things broke in the first two months. They are all worth naming.

The Exact Online refresh token expired silently at 14:30 on a Wednesday. The agent had been pulling credit notes successfully for nine days. We caught it because the MATCHED rate dropped from 94 percent to zero in one polling interval. We now refresh the token on a thirty-day rolling schedule and page the on-call if the agent receives an HTTP 401 from Exact.

The WMS started returning amounts in euros as floats on one endpoint and in cents as integers on another after a deploy. Our match_key normalised everything to cents, so it broke silently. We added a sanity check. Any single refund amount that came in as a float between 0.01 and 100,000.00 got logged as suspicious and held for review.

The pain.001 file rejected with a CDTR-001 error from Rabobank one afternoon because two payments shared the exact same EndToEndId. We had been generating EndToEndId from a hash of order_id, and the webshop had reused an order_id across two customers. The fix was to include the timestamp in the ID. We learned to never trust the uniqueness of any identifier that originated outside our system.

The last one was the most embarrassing. Daylight Saving Time caught us in week seven. The agent assembled the batch at 15:50 Amsterdam time on weekdays. On the last Sunday in October, when the clocks fell back, our cron job fired at 14:50 because the container was running with TZ unset and defaulted to UTC, while the scheduler that registered the job had quietly done the conversion at deploy time. The Monday batch went out an hour early. No customer complained, but the audit log looked wrong. We fixed it by pinning TZ=Europe/Amsterdam in the container image and adding a smoke test that asserts the assembled batch filename carries the expected hour token before the file leaves the host. We now run that smoke test on the last Sunday of March and October on a separate schedule that no human can forget about.

The numbers after eight weeks

After eight weeks in production, the picture changed shape.

Friday evening finance work dropped from a six to fourteen hour range to a 35 to 50 minute range.
The MATCHED rate sits at 96.4 percent on the first reconciliation pass, 99.1 percent after a second pass at 11:00 the next morning.
The FOUR_EYES queue averaged 47 refunds per day, with a median approval-to-release time of 22 minutes.
Zero refunds over €120 left the bank without two approvers. The constraint did its job.
The 16:00 SEPA cutoff was missed twice, both times during an unrelated Exact Online outage in week three.

The CFO has stopped working Fridays.

What we would steal for the next one

A reconciliation agent is not a join. It is a state machine with a humane queue at the front, a hard constraint in the database in the middle, and a deadline at the end. If you build it in that order you will not regret it.

The pieces that move are always the same. Identifiers from foreign systems lie. Cutoffs are absolute. Four-eyes belongs in the schema, not the UI. Timezones are a constraint, not a configuration value. And the alert that catches the abnormal day pays for itself the first time it fires.

When we shipped this process automation for the Breda fulfilment team, the thing we ran into hardest was the EndToEndId collision. We solved it by treating every external identifier as a hint, never a key, and generating our own internal IDs from a salt plus a timestamp.

If you reconcile anything weekly, the smallest thing you can do today is run one SQL query across your two systems. How many records exist on one side that have no match on the other, this week? If the number surprises you, you have your first ticket.

Key takeaway

Reconciliation is a state machine with a humane queue at the front, a database constraint in the middle, and a deadline at the end.

FAQ

What is the vier-ogen-principe in payment operations?

It is the Dutch term for the four-eyes principle: any material financial action needs two distinct approvers. In refunds it means a second person signs off before money leaves the account.

Why €120 as the four-eyes threshold and not a round number?

It came from the client's own loss-tolerance math. Below €120 the cost of a second approval exceeded the expected fraud exposure. Pick your own threshold from your own loss data, not from a benchmark.

Can you build reconciliation on Exact Online's standard reports?

For low volumes, yes. At 6,800 returns per week you need the REST API, an internal state table, and a polling agent. The standard reports do not expose the freshness or the granularity required.

What if a refund has to leave faster than the 16:00 SEPA cutoff?

Instant SEPA (SCT Inst) settles in seconds, 24/7, but costs more per transaction and not every bank channel supports batches. For one-offs we route them through the bank portal manually with a Slack ping.

process automationautomationintegrationsworkflowoperationscase study

Building something?

Start a project