Process automation

Customs agent playbook: 3,180 weekly aangiften, three systems

A 26-person Den Haag freight forwarder, 3,180 douaneaangiften a week, three customs systems that disagree, and one queue for anything over €25,000 in import duties.

Jacob Molkenboer· Founder · A Brand New Company· 7 Sept 2025· 9 min

Brass ledger stamp on stacked carbon-copy customs forms tied with linen twine, green paper tag, wax seal, ivory paper surface.

It's Monday 06:40 in Den Haag. The weekend shift logged 612 douaneaangiften into AGS between Friday afternoon and Sunday night. Eighty-four are sitting in a "controle pending" state because the CSV pulled from Cargonaut doesn't agree with the AGS confirmation. Forty-one of those carry over €25,000 in invoerrechten each, and the senior douane-expediteur won't be at his desk until 08:30. Until he signs off, nothing ships.

This is the operator we built for: twenty-six people, one Den Haag office, roughly 3,180 customs declarations a week across three systems that were never designed to talk to each other. What follows is the playbook we wish we'd had on day one.

The shape of the work

Every shipment that clears the warehouse touches three systems before goods can roll. AGS is the Dutch import declaration system run by the Belastingdienst. It's being phased out for DMS, but in 2026 it still carries most of the import traffic. NCTS is the EU transit system, with MRN as the canonical join key. And then there's the Cargonaut portal at Schiphol, now folded into Portbase but still serving a 17-year-old web frontend for air-freight pre-notifications.

The interesting failures live in the seams. AGS confirms a declaration. NCTS opens a transit. Cargonaut reports the cargo arrived. Then something doesn't line up: the gewicht on the Cargonaut record is 14kg lighter than the AGS goederencode line, or the MRN on the NCTS document points at an aangifte that was cancelled and re-filed under a different number. The operator's job is to catch this before the Douane does.

Before the agent existed, that catching happened in a shared Excel that one senior expediteur maintained by hand. It took roughly four hours every morning. On Monday it took eight.

Read for ten weeks before you write

The single decision that saved us from a disaster shipping date was refusing to call the AGS submit endpoint for the first ten weeks.

The agent ran in read-only mode against all three systems. It pulled the AGS XML feed, polled NCTS through the official channel, and screen-scraped Cargonaut with a headless browser. Everything landed in a Postgres schema we treated as append-only. No updates, no deletes. Each ingest produced a new row keyed by (source_system, source_id, observed_at).

What the read-only phase bought us was three things. First, a real diff between what the operator thought was in each system and what was actually there. Three of the first four weeks surfaced declarations the team didn't know had been cancelled by the Douane. Second, a baseline error rate: across 12,400 declarations the agent watched in silence, 0.7% had a Cargonaut-vs-AGS gewicht mismatch large enough to matter. That's the number that defined the reconciliation queue. Third, confidence to ship the write path. By week ten, the agent's proposed actions had agreed with the senior expediteur's actual actions on 97.2% of the cases reviewed in retrospective. The 2.8% gap was almost entirely edge cases on chartered air freight.

Takeaway

If your agent will eventually file customs declarations, the most useful thing it can do in month one is watch in silence and disagree with humans on paper.

The €25,000 four-eyes queue

The €25,000 invoerrechten threshold isn't a regulatory line. It's the line above which a mistake hurts the forwarder enough that the senior expediteur insisted on a second pair of eyes before anything was submitted. That's a business rule, and the agent enforces it before it ever touches the AGS submit endpoint.

The gate sits in the application code, not in the policy doc. Here's the actual shape of the check, simplified:

async function gateForSubmit(aangifte: Aangifte): Promise<GateResult> {
  const duties = computeInvoerrechten(aangifte);

  if (duties.cents >= 2_500_000) {
    return {
      action: "queue_for_review",
      queue: "four_eyes_high_value",
      reason: `invoerrechten ${formatEUR(duties)} exceeds €25,000 threshold`,
      requiredApprovers: 2,
    };
  }

  const anomalies = await detectAnomalies(aangifte);
  if (anomalies.length > 0) {
    return {
      action: "queue_for_review",
      queue: "anomaly_review",
      reason: anomalies.map(a => a.label).join("; "),
      requiredApprovers: 1,
    };
  }

  return { action: "submit_directly" };
}

Two things matter here. First, the threshold is one constant in one file. The compliance officer can change it without a deploy if we expose it in the admin UI, but it never lives inside an LLM prompt. Second, the queue is a real database object with named approvers and a state machine, not a Slack channel.

Warning

Never let the LLM decide whether a declaration crosses a money threshold. Compute the number in code, compare with an operand, branch on the comparison. The model can summarise the case for the reviewer; the gate is a function.

Idempotency on the AGS submit

The AGS submit endpoint is not idempotent in any way you'd recognise from a modern API. If you POST the same XML twice with two different correlation IDs you get two declarations, two MRNs, and a phone call from the Douane.

We solved this with a single Postgres unique index and a deliberately boring outbox pattern.

create table ags_submissions (
  id            bigserial primary key,
  intent_hash   bytea       not null,
  aangifte_id   bigint      not null references aangiften(id),
  status        text        not null check (status in (
                  'pending','sent','acknowledged','rejected'
                )),
  ags_mrn       text,
  sent_at       timestamptz,
  ack_at        timestamptz,
  payload_xml   text        not null,
  created_at    timestamptz not null default now()
);

create unique index ags_submissions_intent_uniq
  on ags_submissions (intent_hash);

The intent_hash is a SHA-256 over the canonicalised XML payload plus the aangifte version. If the same payload tries to enter the outbox twice, Postgres refuses. The submitter worker only ever reads pending rows, sends them, and updates status. A crash mid-submission leaves a sent row with no acknowledgement; the reconciler polls AGS for the MRN by correlation ID and either fills in ags_mrn or flips status to rejected.

This is unglamorous infrastructure. It is also the difference between sleeping at night and the customs broker calling at 23:00 because there are two MRNs for a single container.

NCTS, MRN, and the join everyone gets wrong

NCTS gives you an MRN. AGS gives you a different MRN. Cargonaut, for air freight, often gives you neither and identifies the consignment by AWB. The temptation is to write a fuzzy matcher. Don't.

What worked: a strict join table populated only by operations the agent itself performed, plus an explicit unknown_link row for anything it observed but didn't initiate. The agent never invents a link between an AGS aangifte and an NCTS transit. It either has a deterministic association from a workflow it ran, or it asks the operator to confirm and writes the answer back.

Fuzzy matching customs documents reads beautifully in a design doc and produces lawsuits in production.

Scraping the 17-year-old portal

The Cargonaut frontend predates the iPhone. It uses frames, server-side session cookies, and a CSRF token that lives in a hidden form field. There is no official API for the data we needed. So we drove it with a headless browser, slowly and politely.

The rules we agreed with the forwarder's IT contact:

One scraper session per office hour, never in parallel.
The session uses a dedicated service account, not an operator's account.
Every request is logged with a request ID that survives in the audit trail.
If the portal returns a 5xx or a login page, the scraper stops and pages the operator instead of retrying.

The scraper is the only part of the system that ever flakes. We treat it like the unreliable narrator it is and reconcile its output against AGS, never the other way around.

Append-only as the audit log

Customs records have a seven-year retention obligation in the Netherlands. They also have a habit of being asked for, in court, in the worst possible quarter. The agent's database has no UPDATE or DELETE statements in application code. State changes are new rows.

This sounds expensive until you do the maths. At 3,180 aangiften a week, with roughly twelve state transitions each, the agent writes about 38,000 audit rows weekly. That's two million rows a year. A modestly-sized Postgres instance does not notice.

There's a related point that's been making the rounds: in Postgres, the only delete that scales without a migration plan is DROP TABLE. If you ever need to remove a large slice of data, partitioning by month or year and dropping whole partitions is the only path that doesn't bloat the table for weeks. Build that partitioning in from day one, even if you think you'll never need it. We didn't, and the cleanup window after our first retention review took longer than it should have.

Where the agent stops and the human starts

The agent doesn't sign declarations. It doesn't make legal judgements. It assembles the case, computes the duties, picks the queue, and writes a one-paragraph summary of why this aangifte is in the queue it's in. The senior expediteur reads the summary, opens the source documents in one click, and either approves or rejects.

What changed in the office: Monday morning reconciliation went from eight hours of one senior to under thirty minutes of two juniors clearing the high-value queue. The "declarations the team didn't know had been cancelled" category disappeared, because the agent flags cancellations within fifteen minutes of the Douane posting them. The senior expediteur stopped owning the Excel. He owns the policy, the thresholds, and the exceptions. The Excel is gone.

What we'd ship differently

Three things, with the benefit of nine months in production.

The migration to DMS is coming and we underestimated how much of the AGS XML coupling lives in agent code. If we were starting today we'd put a thin internal API in front of "the declaration system" from day one, with AGS as the first adapter and DMS as the second. Cheap to add up front; painful to retrofit.

The four-eyes queue should have shipped with a service-level objective from week one. The forwarder cared about Monday morning latency more than total throughput, and we tuned for the wrong thing for the first six weeks. Define the SLO with the operator before you write the dashboard.

The scraper should have been the first thing isolated into its own process with its own restart policy. When it goes down, you want the rest of the agent to keep working on AGS and NCTS data without pretending Cargonaut is empty. We learned this at 02:00 on a Tuesday.

The smallest thing you can do today

If you operate a process where two systems are supposed to agree and a human reconciles them in a spreadsheet every morning, spend one afternoon writing the read-only diff. Don't propose actions yet. Just count, every day for a week, how often the two systems disagree and by how much. That number is your business case, your error budget, and your reviewer queue threshold, all at once.

When we built the reconciliation agent for the Den Haag forwarder, the part that took longest wasn't the AGS integration or the Cargonaut scraper. It was earning the right to call the submit endpoint, by being correct in silence for ten weeks first. That's the shape of every process-automation agent we've shipped since.

Key takeaway

If your agent will eventually file customs declarations, the most useful thing it can do in month one is watch in silence and disagree with humans on paper.

FAQ

Why build against AGS if the Belastingdienst is migrating to DMS?

Most import flows in 2026 still run on AGS in production. We built AGS first because that's where the volume lives. DMS goes behind the same internal declaration API once the production endpoint is generally available.

How long did the full build take?

Roughly nine months. Ten weeks of read-only observation, six weeks building the four-eyes queue and outbox, then a phased rollout where the agent took over one document type at a time, starting with the lowest-value lanes.

What happens when the agent and the operator disagree?

The operator's decision wins, always, and the disagreement is logged. We review disagreement clusters monthly. Patterns become either new rules in the agent's policy code or new notes for the operators.

Does the LLM ever pick the customs code or compute the duty?

No. Customs codes and duty calculations are deterministic against the Tarief and run in code. The LLM summarises the case for the reviewer and drafts notes. Numbers and gates never come from the model.

process automationai agentsintegrationsarchitectureoperationscase study

Building something?

Start a project