← Blog

Process automation

Process automation case study: 3,420 weekly exceptions

It's 19:43 on a Tuesday in Berchem. The dispatch lead has 612 Bpost rows, 287 PostNL rows, three drivers' douane paperwork, and a WMS from 2010 open.

Jacob Molkenboer· Founder · A Brand New Company· 14 Jun 2026· 10 min
Open linen ledger, three iron tags fanned across pages, one chartreuse ribbon, brass scale, rubber stamp on ivory desk.

It's 19:43 on a Tuesday in Berchem, Antwerp. The dispatch lead has open: a Bpost track-and-trace export with 612 rows from today, a PostNL CSV with 287, a SharePoint folder with douane paperwork from three drivers, and a 2010-vintage Centric Carelogistics WMS that still lists 'Internet Explorer 8 or higher' on its login screen. Tomorrow at 06:00 the warehouse opens. By then she needs to know which parcels are bevroren by customs, which are mis-scanned, which can leave on the morning run, and which 14 the controllers have to phone about. This is the kind of operations swamp where process automation either earns its keep or quietly makes things worse.

This is what we walked into in February. We built a process-automation agent that now handles 3,420 of these exceptions a week and never closes a ticket the controller has not signed off on. This post is what we learned.

The 11pm reconciliation problem

The operator runs roughly 4,800 parcels a day across Flanders and the southern Netherlands, with a 39-person team. Roughly 9% of those parcels generate some kind of exception on a given week: a missed scan, a customs hold (douane-bevroren), an undeliverable, a return-to-sender, a duplicate label that confuses both Bpost and the WMS, a driver who scanned the parcel into the wrong stop.

Until February, the dispatch lead and one ops controller did all of this by hand. Two tabs of Excel. The PostNL portal in one window, the Bpost track-and-trace exporter in another, the Centric WMS in a third. They closed about 2,400 of the 3,400 weekly exceptions. The rest aged into the next week and accumulated.

The hidden cost was not the hours. The hidden cost was the douane backlog. A customs-frozen parcel that does not reach the right officer with the right paperwork inside 24 hours starts to incur storage fees from the carrier. Over six months in 2025 the operator paid €11,300 in storage they should not have paid. Nobody had attributed that line to a workflow problem until we mapped it.

Inside a Centric WMS deployment from 2010

The WMS is a Centric product the team has been running since 2010. It is not a SaaS. It runs on a Windows Server 2019 VM, talks to a Microsoft SQL Server 2014 instance, exposes a SOAP API on an internal subnet, and has a web UI that the operator long ago skinned with a custom CSS file so it stops looking like 2010.

It is also, in the operational sense, fine. It tracks every parcel, every scan, every driver assignment, every customs hold. The schema is sane. The SOAP endpoints work. What it does not do is talk to Bpost or PostNL on its own, and it does not handle exceptions as first-class objects. An exception in Centric is a status field on a parcel row, not a queue.

This is the situation that earns most of our process-automation work: a legacy system that is fundamentally correct and not worth replacing, sitting next to two or three modern APIs (carriers, customs portals, a CRM) that need to feed into it. The conversation about replacing the WMS would take 18 months and cost €280,000 minimum. The conversation about the agent took six weeks.

The exception taxonomy

Before any agent code was written, we sat with the dispatch lead and the controller for two days and named every kind of exception that crosses their desks. We ended up with 17 categories. The agent only needs to handle the top six on its own. The rest get queued for the controller.

The taxonomy mattered more than the model choice. Here is the version we shipped:

exceptions:
  bpost_customs_hold:        # douane-bevroren, needs officer routing
    auto_route: douane_queue
    sla_seconds: 90
    requires_signoff: true
  bpost_missed_scan:         # parcel exists in WMS, not in Bpost feed
    auto_route: dispatch_queue
    requires_signoff: false
  postnl_undeliverable:      # NL side, retry slot
    auto_route: retry_queue
    requires_signoff: false
  duplicate_label:           # same tracking ID, two parcels
    auto_route: controller_queue
    requires_signoff: true
  driver_misscan:            # wrong stop, detect from GPS
    auto_route: dispatch_queue
    requires_signoff: false
  return_to_sender:          # carrier-initiated
    auto_route: rts_queue
    requires_signoff: true
  unknown:
    auto_route: controller_queue
    requires_signoff: true

Anything the agent cannot match to a known taxonomy lands in controller_queue. That is the failure mode we want. Better to put 40 weird parcels on the controller's desk than to mis-categorise one customs hold.

How the agent reconciles

Every five minutes, the agent runs the following loop:

  1. Pull the latest Bpost and PostNL track-and-trace state for parcels marked 'in transit' in the WMS. Both carriers expose this through their developer portals (bpost.cloud and developer.postnl.nl).
  2. For each parcel, compute the diff between WMS state and carrier state.
  3. Classify the diff against the taxonomy.
  4. Route the exception to the right queue.
  5. For high-stakes categories (customs, RTS, duplicate label), require explicit controller signoff before the agent writes anything back to the WMS.

The loop is dull. That is the point. Operations leads have read enough breathless agent demos. What earns trust is a thousand consecutive five-minute cycles where the agent does the same boring thing.

The interesting code is at the carrier-state diff. Bpost and PostNL use different status vocabularies, different freshness guarantees, and different timezone conventions in their feeds. PostNL gives you UTC. Bpost gives you Brussels local, except in the customs payload, which is UTC. We learned that the slow way.

def reconcile(parcel, bpost_state, postnl_state):
    wms = wms_client.get(parcel.id)
    carrier = bpost_state or postnl_state
    if carrier is None:
        return classify_missed_scan(wms)

    if carrier.kind == "customs_hold":
        # bpost payload is UTC even though the rest of bpost is CET
        hold_age = utcnow() - carrier.held_at_utc
        return ExceptionEvent(
            type="bpost_customs_hold",
            parcel_id=parcel.id,
            payload=carrier.douane_payload,
            age_seconds=hold_age.total_seconds(),
            requires_signoff=True,
        )

    if carrier.status_code in WMS_KNOWN_STATUSES:
        if carrier.status_code != wms.status_code:
            return ExceptionEvent(
                type=classify_status_diff(wms, carrier),
                parcel_id=parcel.id,
            )
    return None

The four-week shadow ramp

An agent that talks to a legacy WMS does not earn trust on day one. We ran the agent in shadow mode for the first two weeks: it reconciled every exception, classified every event, but wrote nothing back to Centric. The controller saw the agent's proposed queue alongside her own work. Every morning she reviewed the deltas. By the end of week two, the controller had flagged 47 disagreements. We fixed 41 of them in code and pushed the remaining six into the unknown queue for permanent human review.

Weeks three and four, we let the agent write back, but only for the four lowest-stakes categories: missed scans, driver misscans, undeliverables, and retries. Customs holds, RTS, and duplicate labels stayed shadow-only for another fortnight. By week six the full loop was live, and the controller had spent enough time looking at the agent's reasoning to trust it on customs. We have run this same shadow-to-live ramp on every process-automation agent since. The cost is two weeks of slower payoff and a permanent reduction in the chance you ship a confident, wrong agent.

Why the controller still signs off

The non-negotiable line, from day one, was this: the agent never closes a ticket the controller has not approved. Not for customs. Not for returns. Not for anything where the operator is on the hook with a third party.

Takeaway

An automation agent that quietly closes tickets buys you speed and loses the controller's trust within a week. Put the human at signoff, not at data entry.

What the controller sees in the morning is not a list of 612 raw exceptions. It is a queue of 14 pre-classified, pre-routed, pre-summarised tickets, each with the relevant carrier payload, the WMS row, and a single button: approve and write back, or reject and reclassify. Average signoff time across the controller's queue went from 4.5 minutes per ticket to 38 seconds.

The agent also writes nothing to the SQL database directly. Every WMS write goes through the SOAP endpoint, in the same way a human user would, using the controller's session token. The audit log in Centric still shows the controller as the actor. We did this on purpose. When the auditor comes, 'the AI did it' is not an answer.

Slotting customs-frozen parcels in 90 seconds

The customs case is the one that earns its keep. When a Bpost feed marks a parcel HELD_FOR_CUSTOMS, the agent has to:

  1. Pull the douane payload from Bpost (the TARIC commodity code, declared value, sender, recipient).
  2. Match it against the operator's customs officer roster (Antwerp port has four officers the team works with).
  3. Decide which officer queue gets the parcel, based on commodity type and current officer load.
  4. Drop the pre-filled customs paperwork into the officer's queue.
  5. Alert the dispatch lead's Slack so a courier can be reassigned.

From the moment the Bpost feed flips the status, the parcel sits in an officer's queue within 90 seconds. The previous human-only loop averaged 6 to 14 hours, depending on whether the dispatch lead was actively watching. Customs storage fees over the first ten weeks dropped from a run-rate of about €1,900 a month to €260.

The officer queue itself is not glamorous. It is a Postgres table, a small Next.js page on the internal subnet, and a Telegram bot for the after-hours officer. Nothing in this stack would impress a Hacker News thread. It works because the boring parts were drawn correctly.

Results after three months

Numbers from the operator's own reporting, period 2026-03-01 to 2026-05-31:

  • 3,420 average weekly exceptions reconciled, up from a 2,400 baseline.
  • End-of-week backlog down from an average of 980 to 11.
  • Customs storage fees down 86%.
  • Dispatch lead's evening shift cut by 1.7 hours per day.
  • Zero auto-closed customs tickets. Every one signed off by a controller.

The number we did not improve is the carrier error rate itself. Bpost and PostNL still generate the same volume of exceptions. We did not fix the carriers. We fixed what happens after the carriers fail.

What we would change

Two things we got wrong on the first cut.

First, we initially built the controller signoff as a daily digest at 07:00. That was wrong for customs. A customs hold at 09:00 should not wait until the next morning. We split the queue into 'signoff today' (customs, RTS, duplicates) and 'signoff this week' (everything else) in week three. The daily digest pattern is fine for most ops work and wrong for anything time-bound.

Second, we underestimated how much observability the controller needed. The first version of the agent ran silently. The controller asked, reasonably, 'how do I know it's running?' We added a status page, then a per-loop log, then a per-parcel trail. If you are deploying an agent against legacy infrastructure and you have not built the operator a way to see what the agent did and why, you have shipped half a product. The first time something goes wrong, you will need to explain it line by line, and you will not have the lines.

The small thing you can do today

When we built this exception-reconciliation agent for the Antwerp operator, the breakthrough was not the model. It was sitting at the controller's desk for two days and naming every kind of exception out loud. Most operations teams cannot list theirs without ten minutes of thinking. That gap is the work. If you want to know whether your shop is ready for process automation, do the two-day taxonomy first.

Open a spreadsheet. List every exception your team handled last week. Group them. The categories that show up five or more times are the agent's job. The ones that show up once are the controller's job. The agent should not touch those.

Key takeaway

An automation agent that quietly closes tickets buys speed and loses the controller's trust within a week. Put the human at signoff, not at data entry.

FAQ

Why didn't you replace the 16-year-old Centric WMS instead of building an agent on top?

Replacement was an 18-month, €280,000 conversation. The WMS schema and SOAP API were correct. The gap was carrier integration and exception handling, both of which an agent could close in six weeks.

Does the agent ever close a customs ticket without a human?

No. Customs, returns-to-sender, and duplicate labels all require explicit controller signoff before the agent writes back to the WMS. The audit trail in Centric shows the controller as the actor.

How does the agent know which customs officer should get a frozen parcel?

It matches the Bpost douane payload (TARIC commodity code, declared value, sender) against the operator's officer roster and current officer load. The decision lands in the queue within 90 seconds.

What was the single biggest operational win after three months?

Customs storage fees dropped 86%, from roughly €1,900 a month to €260. Faster officer routing on customs-frozen parcels was the lever; nothing else came close on the P&L.

process automationai agentscase studyintegrationsoperations

Building something?

Start a project