← Blog

Process automation

Customs reconciliation playbook: a four-eyes queue for AGS

19:00 on a Thursday in Roermond. Two declarations are stuck because somebody missed a dual-use classification, and the AGS endpoint already took them.

Jacob Molkenboer· Founder · A Brand New Company· 20 Nov 2025· 10 min
Two cream declaration forms with brass relays, a green wax seal, and a red-stamped card on a paper blotter.

It is 19:00 on a Thursday in Roermond. The team is heading home. Two customs declarations are stuck in the gateway because a broker missed a dual-use classification on a thermal imaging camera, and the AGS-submit endpoint has already taken the filing. The compliance lead now has to recall the declaration before the consignment leaves Rotterdam at 04:00. The recall costs a few hours and a small mountain of paperwork. The fix that would have prevented it is twelve lines of Python and one extra database row.

This is the playbook for that fix, written after we shipped it for a 31-person logistics-tech vendor whose brokers file around 5,400 douane-aangiften a week through a thirteen-year-old Descartes e-Customs gateway, with a homegrown PostgreSQL pakbon-ledger on the side. The agent we built reconciles the two systems on every change, parks anything that smells like dual-use, and refuses to call the AGS-submit endpoint until two human pairs of eyes have signed off. The shape of the playbook generalises. The detail does not, and that is where the work lives.

The shape of the work

Most customs back-offices look the same from a distance. A gateway that talks to the national customs system (in the Netherlands that is AGS, with DMS coming online for a growing slice of traffic). A packing-list or pakbon ledger that the warehouse owns. A couple of brokers who keep both in sync by hand. Mismatches happen all day. A pakbon gets re-cut after a partial shipment. A broker amends an HS code. The gateway returns a corrected MRN. None of that is unusual.

What kills you is the small fraction (less than one percent, in this client's case) of declarations that touch goods on the EU dual-use list. Under Regulation (EU) 2021/821, those need an export licence, a check against end-use, and a written record of who approved the filing. Miss one and you do not get a fine and a slap. You get a knock on the door from the Centrale Dienst voor In- en Uitvoer, and your AEO status starts to wobble.

So the agent we built has two jobs. Reconcile fast. Refuse to submit until a human has confirmed the dual-use call.

The two systems

The Descartes side is a SOAP gateway from 2013. It accepts EDI-style XML, returns MRN numbers, and emits status events on a queue. It is not pleasant to talk to. It is also stable, has not changed schema in eight years, and the brokers have built habits around its exact failure modes. We did not replace it. We wrapped it.

The pakbon side is a PostgreSQL database the warehouse team grew over a decade. Eighty-three tables, no foreign keys on the hot path, and a habit of rewriting zending_id rows in place when a shipment is split. The warehouse owns that database. We do not get to refactor it. We get to read from it and write to a thin events table next to it.

Between them sits a Python service. Three components: a reconciler, a queue, and a submitter. The reconciler runs on a five-minute tick. The queue is a Postgres table. The submitter is a worker pool that talks to Descartes one declaration at a time.

The reconciliation contract

Reconciliation is a five-minute loop. For every zending that has changed in the last window (we use a last_touched_at trigger that the warehouse already had), the agent pulls the corresponding declaration from Descartes, compares the canonical fields, and either marks the pair clean or files a discrepancy.

The contract is short and worth writing down on paper:

CANONICAL_FIELDS = (
    "exporter_eori",
    "consignee_eori",
    "hs_code",
    "country_of_destination",
    "net_mass_kg",
    "invoice_value_eur",
    "incoterm",
    "licence_ref",
)

def reconcile(zending_id: str) -> Reconciliation:
    pakbon = load_pakbon(zending_id)
    aangifte = descartes.fetch(pakbon.mrn) if pakbon.mrn else None

    if aangifte is None:
        return Reconciliation.new(zending_id, status="pending_submit")

    diffs = [
        Diff(field, getattr(pakbon, field), getattr(aangifte, field))
        for field in CANONICAL_FIELDS
        if getattr(pakbon, field) != getattr(aangifte, field)
    ]

    if not diffs:
        return Reconciliation.clean(zending_id, mrn=aangifte.mrn)

    return Reconciliation.discrepant(zending_id, diffs=diffs)

Nothing clever. The cleverness is in the eight fields. Pick fewer and you miss real discrepancies. Pick more and the brokers spend their afternoons clearing noise from cosmetic differences in addressing or packaging.

Takeaway

Reconciliation is a contract, not a diff. Write the canonical fields down on paper with the broker who files the most declarations, before you write the code. Eight is usually the right number.

Dual-use as a special path

Every declaration that comes off the reconciler with status pending_submit goes through a dual-use check before it goes anywhere near Descartes. The check is not subtle. We maintain a local copy of Annex I of the dual-use regulation indexed by HS chapter, country of destination, and a few keyword heuristics on the goods description.

If the HS code matches a controlled category, or the description contains words on the watchlist (thermal, encryption, frequency, navigation, centrifuge, and another forty), the agent does not submit. It writes the declaration into the four-eyes queue and sends a Slack message to the compliance channel. The submit endpoint is never called.

This is the bit we will not compromise on. The agent is biased to park, not to submit. The classifier returns a confidence score between zero and one, but we never read it as an autosubmit signal. The compliance team would rather burn ten minutes confirming an obviously fine thermal-imaging shipment than skip the check on one that mattered. So the score sorts the queue. It does not drive the decision. The watchlist itself was built across a week of evenings, sitting next to the broker who has filed the most declarations on the team. Every word on it earned its place by being attached to a real declaration he could remember filing.

The four-eyes queue

The queue is a PostgreSQL table next to the pakbon ledger. It is not a message broker. It is not Kafka. It is not a workflow engine. It is a table. The reason is operational. When the warehouse team needs to know what is stuck, they open psql and run a query. They already know how to do that. Adding a new system means adding a new thing to learn at 22:00 on a Sunday, and that is when nothing should be new.

CREATE TABLE compliance_queue (
    id              bigserial PRIMARY KEY,
    zending_id      text NOT NULL,
    reason          text NOT NULL,
    payload         jsonb NOT NULL,
    parked_at       timestamptz NOT NULL DEFAULT now(),
    first_reviewer  text,
    first_decision  text CHECK (first_decision IN ('approve','reject')),
    first_at        timestamptz,
    second_reviewer text,
    second_decision text CHECK (second_decision IN ('approve','reject')),
    second_at       timestamptz,
    released_at     timestamptz,
    UNIQUE (zending_id, parked_at)
);

CREATE INDEX compliance_queue_open
    ON compliance_queue (parked_at)
    WHERE released_at IS NULL;

Two reviewers, two timestamps, two recorded decisions. The reviewer cannot be the same person twice (we check against the broker SSO). The agent only releases when both rows are filled in and both decisions are approve. Any reject from either reviewer closes the row with released_at = now() and the pakbon is flagged in the warehouse system so it does not ship.

For the worker that picks declarations off the queue once they are cleared, we use FOR UPDATE SKIP LOCKED. That keeps the design boring. No separate broker, no separate consensus protocol, no race between two workers picking the same row.

SELECT id, zending_id, payload
FROM compliance_queue
WHERE released_at IS NULL
  AND first_decision = 'approve'
  AND second_decision = 'approve'
ORDER BY parked_at
FOR UPDATE SKIP LOCKED
LIMIT 1;

Idempotency before the AGS-submit endpoint

The Descartes submit call is the only place in this system where a mistake is irreversible. Once you have an MRN, you have told the Dutch customs authority that a shipment is going. Correcting that is paperwork. Doubling it is paperwork on fire.

So we wrap the submit endpoint in a strict idempotency layer. Every declaration gets a deterministic submission_key derived from the pakbon hash plus a monotonic version. The submitter records the key in a submissions table inside a transaction, calls Descartes, and only commits the row if the gateway returns an MRN. If the gateway times out, the row stays uncommitted, the key is still free, and the retry loop will try again with the same key.

def submit(declaration: Declaration) -> str:
    key = submission_key(declaration)

    with db.transaction() as tx:
        existing = tx.fetchone(
            "SELECT mrn FROM submissions WHERE submission_key = %s",
            (key,),
        )
        if existing and existing.mrn:
            return existing.mrn

        tx.execute(
            "INSERT INTO submissions (submission_key, declaration_id, started_at)"
            " VALUES (%s, %s, now())"
            " ON CONFLICT (submission_key) DO NOTHING",
            (key, declaration.id),
        )

        mrn = descartes.submit(declaration)  # may raise

        tx.execute(
            "UPDATE submissions SET mrn = %s, committed_at = now()"
            " WHERE submission_key = %s",
            (mrn, key),
        )
        return mrn

The pattern matters more than the code. A submission key the warehouse can reproduce by hand. A row that exists before the network call. A commit that only happens after the gateway has acknowledged. Whatever you do, do not derive the key from a wall-clock timestamp. A clock skew between two workers will give you two keys for the same shipment, and you will discover this at the worst possible moment.

The shadow-mode month

Before we let the agent submit anything, it ran for thirty days in shadow mode. Every declaration the brokers filed by hand was also processed by the agent on a parallel path. The reconciliation result went into a parallel table, the dual-use decision into a parallel queue, and a nightly job diffed the two streams. The brokers never saw any of it. The point was to surface every disagreement before it mattered.

In week one the agent disagreed with the brokers on roughly four percent of declarations. By the end of week three we were at one percent, and almost all of the remaining disagreement was the agent parking shipments the brokers had cleared verbally at the desk. That is the failure mode you want. A false positive in the queue costs ten minutes of a reviewer's time. A false negative on a dual-use shipment costs your AEO status.

The shadow month was the only thing that gave the operations director the confidence to flip the production switch. We did not skip it on the next two clients either, even though by then we were sure of the code. The code was never the question. The trust was.

Observability that survives Monday morning

The agent logs three things, and only three things, in a way the operations team actually reads. A daily digest at 07:30 to the compliance channel with the count of declarations submitted, parked, and rejected. A real-time alert for any submission that takes more than thirty seconds to acknowledge. A weekly export of the four-eyes queue with median time-to-second-review.

That last one matters most. If second reviews are taking more than four hours on average, the queue is not a safety net any more. It is a bottleneck, and the brokers will start to route around it (usually by asking a colleague to rubber-stamp the second approval). We watch that number harder than we watch any latency metric on the gateway. When it drifts, we add a reviewer or we tighten the watchlist. We never let it sit.

What changed for the brokers

Before the agent went live, the Roermond team had two brokers spending roughly a day a week each chasing discrepancies between the pakbon ledger and the gateway. They now spend about an hour a week on it, almost all of it in the four-eyes queue, which is the time they should be spending. The dual-use checks that used to be a verbal "I think we are fine" at the desk are now a row with two names and two timestamps on it. That is the part the compliance auditors care about.

The first month after go-live, the queue caught two real dual-use mistakes that would have gone out otherwise. One was a frequency-jammer component bound for a Turkish freight forwarder where the end-use on the paperwork was ambiguous. The other was a thermal-imaging unit routed through a Singapore consignee with onward paperwork that suggested re-export to a destination on a sanctions shortlist. Neither was obvious from the pakbon. Both were caught because the classifier flagged the description and a second reviewer asked a question the first reviewer had not thought to ask.

None of this is exotic. The agent is a Python service, a Postgres table, and a respect for the systems that were already there. When we built the process-automation agent that runs this reconciliation, the temptation we kept resisting was the urge to replace the thirteen-year-old Descartes wrapper with something modern. We did not. We wrapped it tighter, and the brokers stopped getting paged at 22:00 on Sunday.

If you run a customs back-office and you want to know where to start: write down your eight canonical fields tomorrow morning, on paper, with the broker who files the most declarations. That conversation is the whole playbook in miniature.

Key takeaway

Reconciliation is a contract, not a diff. Write your eight canonical fields on paper with the broker who files the most declarations, before you write the code.

FAQ

Why wrap the old Descartes gateway instead of replacing it?

It is stable, the brokers have habits around its failure modes, and the cost of swapping it dwarfs the cost of wrapping it. We wrapped tighter and let it keep doing its job.

Why eight canonical reconciliation fields and not fifteen?

Eight catches every real discrepancy we have seen. Fifteen drowns the brokers in cosmetic noise. The number is a contract with the people who clear the queue, not a technical choice.

What goes into the four-eyes queue?

Every declaration where the HS code or goods description matches the EU dual-use control list, or the destination is on a sanctioned-country shortlist. Two distinct reviewers must approve before submit.

Can the agent submit a declaration without human approval?

Yes, for the 99 percent that are not dual-use and reconcile cleanly. The interesting work is making the one percent visible and unmissable, not removing humans from the loop.

Why Postgres for the queue instead of a message broker?

The warehouse team already knows psql. At 22:00 on a Sunday, the right tool is the one your operators can query from memory. SKIP LOCKED handles the concurrency we need.

process automationai agentsintegrationsworkflowoperationsarchitecture

Building something?

Start a project