← Blog

Email automation

Email automation for cargo claims: 2h10 down to 14 minutes

Tuesday morning at a Maastricht coffee importer. The operations lead opens her inbox to 124 marine-cargo claim threads. Last year she spent 2h10 sorting them. This year she spends 14.

Jacob Molkenboer· Founder · A Brand New Company· 11 Jun 2026· 10 min
Cream envelope with chartreuse ribbon on ivory linen blotter, brass paper clip and shipping tag with twine beside it.

The operations lead at a Maastricht specialty-coffee importer opens her laptop at 06:48. Sanco container SANU4521789 berthed at Rotterdam yesterday. 320 bags of Kenya AA from Embu were supposed to be dry. Two are not. The surveyor will be on the receiving floor in 90 minutes. Meanwhile her inbox has 137 unread threads, most about other containers, other claims, other notifications of loss waiting their three-day Hague-Visby clock.

A year ago she would have spent until coffee break sorting them. This morning she will spend fourteen minutes. The reason is an email agent we built for her team in March. This is what it does, and what almost broke it.

The shape of the problem

The importer (33 people, around €18M turnover, four origin countries, two warehouses) had a steady operational nightmare. Marine cargo flows in via containers from Mombasa, Santos, Guayaquil and Sumatra. Roughly one in 14 containers triggers some form of cargo claim: water ingress from a cracked door seal, condensation damage from a reefer left in the sun at Algeciras, theft at Beirut, shortage at devanning. The claims correspondence runs through Lloyd's-syndicate cargo underwriters and a handful of agents.

The volume is not catastrophic. 620 claim-related email threads land in the shared inbox each week. The catastrophe is timing. Under the Hague-Visby Rules, Article III(6), the consignee has three days from delivery to give written notice of loss or damage that is not apparent on outturn. Miss the window and the carrier is presumed to have delivered the cargo in the condition described in the bill of lading. The insurer's recovery action against the carrier dies right there. The claim still pays out, but the loss adjuster notes the missed notice, and at the next renewal the importer eats it in premium.

So the operations lead's morning routine had become a triage exercise with legal teeth. Find the threads that mention a container that discharged in the last 72 hours. Find any that mention damage, shortage or notify. Cross-reference each container ID against the Sanco container feed to confirm the actual POD timestamp, because email subject lines lie. Send the notify-of-loss letter, or chase the agent who should have sent one. Then deal with everything else.

Two hours and ten minutes on a normal Tuesday. Three on a Monday after a quiet weekend at the carrier desk. By the time she reached the rest of the inbox, the warehouse was already on her radio.

What the agent does, in plain terms

We installed an email agent on the shared claims@ mailbox. It reads new messages over IMAP IDLE, classifies them, enriches them with container data, and either drafts a reply or moves the thread into one of four labelled buckets. The operations lead opens her inbox to those buckets already sorted. She works the urgent one, then the second one, then the rest by the time her coffee cools.

The four buckets:

  • NOL clock running. A claim that needs a notify-of-loss letter sent within the next 72 hours. Sorted by hours remaining.
  • Awaiting carrier or surveyor. We sent something. The other side needs to respond. We chase at day 5, day 10, day 14.
  • Awaiting us. The carrier or insurer has come back with a question. Sorted by claim size.
  • Reference only. No action this week. Filed, indexed, searchable.

Classification is a small model fine-tuned on three months of the operations lead's manual sorting. We watched her drag threads into folders over IMAP, treated those moves as labels, and trained on the result. Six weeks in, her agreement with the model sat at 97%. The remaining 3% were threads where two competent humans would have disagreed about the right bucket too.

That sounds simple. The interesting work happens in the enrichment step.

The Sanco feed, and the subject-line problem

Every claim-relevant thread mentions at least one container ID. They are regex-friendly: four letters then seven digits, with a check digit that you can validate against the ISO 6346 algorithm. Easy to extract.

The trap is that the email subject reads Re: SANU4521789 - notice of loss but the body might list three containers, only one of which actually discharged in Rotterdam this week. The other two finished discharge in Antwerp eleven days ago, well past the notice window, and they are cited as context, not as the subject of the claim.

So we do not trust the subject. We pull every container ID out of the body, validate each one, and look each up against the Sanco container milestone feed. The feed is a polled JSON endpoint, pushed to a small staging table every fifteen minutes. For each container the relevant fields are the actual gate-out date, the proof-of-delivery date, and the discharge port. The notify-of-loss clock starts at proof of delivery, not at gate-out. We learned that the expensive way.

Here is the deadline calculator, simplified for readability:

type ContainerEvent = {
  containerId: string
  dischargePort: string
  podAt: string | null     // ISO timestamp, may be null
  gateOutAt: string | null
}

function nolDeadline(event: ContainerEvent, now: Date): {
  deadline: Date | null
  hoursRemaining: number | null
  basis: 'pod' | 'gate-out-fallback' | 'unknown'
} {
  // Hague-Visby Article III(6): three days from delivery for
  // non-apparent damage. We treat POD as delivery.
  if (event.podAt) {
    const pod = new Date(event.podAt)
    const deadline = new Date(pod.getTime() + 3 * 24 * 3600 * 1000)
    return {
      deadline,
      hoursRemaining: (deadline.getTime() - now.getTime()) / 3600_000,
      basis: 'pod',
    }
  }
  // No POD yet. Fall back to gate-out plus a buffer for
  // transport to the consignee warehouse. Flag for review.
  if (event.gateOutAt) {
    const gate = new Date(event.gateOutAt)
    const deadline = new Date(gate.getTime() + 5 * 24 * 3600 * 1000)
    return {
      deadline,
      hoursRemaining: (deadline.getTime() - now.getTime()) / 3600_000,
      basis: 'gate-out-fallback',
    }
  }
  return { deadline: null, hoursRemaining: null, basis: 'unknown' }
}

The fallback path matters. About 4% of claims arrive before the Sanco feed has caught up with the POD. We do not want the agent to silently downgrade those threads. The basis field gets surfaced in the inbox label, so the operations lead can see at a glance which deadlines are firm and which are estimates.

What broke

Three things broke in the first six weeks. Each one taught us something.

The bonded warehouse pass-through

Two containers a month transit a bonded warehouse in Antwerp before continuing to Maastricht. The Sanco feed considers the Antwerp drop the POD. The actual receiving floor is in Limburg, three days later. We were closing notify-of-loss clocks that had not actually started. The fix: a small lookup table of bonded handlers per origin route, with a per-container override.

The reply-all reply storm

A claims adjuster's auto-responder went into a loop with a survey company's auto-responder. The agent saw 38 messages from the same thread in two minutes and dutifully drafted 38 replies. We had a queue, but no per-thread debounce. We added one: maximum one outbound draft per thread per six hours, with a manual override the operations lead can click. No production replies were sent. The drafts sat in her outbox waiting for her hand.

That was the first lesson, and the one that should be most obvious in advance. Any email agent that drafts replies should draft only. Production send is a separate, deliberate user action. We have not since revisited the decision, and the recent run of headlines about agents touching production systems unsupervised has not changed our mind.

The container ID that was not one

A long-running claim thread referenced a project code that happened to match the ISO 6346 format: four letters, seven digits, valid check digit. The agent kept trying to enrich it against the container feed and getting an empty response. It correctly fell back to the unknown basis, but the empty enrichment was triggering a daily Slack alert. We added a confidence score: a container ID only counts if the carrier prefix matches a known carrier registered with the importer's traffic department.

What the numbers say

Six weeks after rollout, we measured. The methodology was simple: she started a timer when she opened the inbox, stopped it when she had triaged everything that was not deferred to reference-only. We averaged twelve working days.

  • Average morning triage: 14 minutes, down from 2h10.
  • Missed notify-of-loss deadlines in the measurement period: 0, against a baseline of three in the prior quarter.
  • Claim-thread volume processed: 620 per week, of which the agent fully resolved 71% without human input. The rest she touched.
  • Total agent draft replies sent to production after her review: around 180 per week. She rewrites about one in five.

One number we did not formally track but watched anyway. Classification confidence on Portuguese threads ran roughly eight points lower than on English ones in the first month, and three points lower by week six. We did not retrain the model. The improvement came from the operations lead's corrections feeding back into the labelled set as drift-correction examples, and from her habit of forwarding any thread the agent had visibly misread to a small drift@ alias we set up for exactly this purpose.

The euro figure is harder to pin down because the recovery value of any single claim swings between €600 and €40,000. The insurer's loss adjuster told us, off the record, that the missed-notice rate at this importer had gone from noticeable to best in the book of business. That will show up at renewal.

Takeaway

An email agent earns its keep when the cost of a missed deadline is high and the inbox volume is steady. Most inboxes do not qualify. Marine cargo claims do.

What we would not do again

Three architectural calls we made in week one we would revisit.

First, we hosted the Sanco feed staging table in the same Postgres as the agent's own state. That couples two things that change at different rates. The feed schema is owned by the carrier, who will rename a column on a Tuesday. The agent's state is ours. Next time we keep them in separate schemas at minimum and probably separate databases.

Second, we wrote our own retry logic for the IMAP IDLE connection. That code is now 280 lines and has had four bugs. There are good libraries for this. We did not need to write it. The same goes for the orchestration loop: open-source projects like Burr already model agent state machines properly, and rolling your own buys you debugging time you did not budget for.

Third, we built a language classifier to route prompts to language-specific templates. In practice a marine cargo claim is half English boilerplate wrapped around two paragraphs of whatever language the origin agent writes in, often with a stray sentence in a third language from a co-loader. We now feed every body to one multilingual prompt and let it sort itself out. The classifier was solving a problem that production did not actually have.

The smallest thing you could do today

If you run an operations inbox that handles deadline-sensitive correspondence, open it now and count two numbers. How many threads landed in the last 24 hours? How many of those reference an external identifier (a container, an invoice number, a case number) that you could resolve against an authoritative system in code? If both numbers are non-trivial, you have an agent-shaped problem.

When we built the claims agent for the Maastricht importer, the thing we ran into was not model accuracy. It was the boring stuff: container ID disambiguation, POD versus gate-out semantics, debounce on auto-responders. The language layer handled English, Dutch, Portuguese and Spanish without a second thought. If you want to talk about whether your inbox has the same shape, our AI agents work usually starts with a one-hour audit of your last 200 threads. That is the conversation, not a sales pitch.

Key takeaway

An email agent is worth building when the cost of a missed deadline is high and the inbox volume is steady. Most inboxes do not qualify. Marine cargo claims do.

FAQ

How does the agent handle non-English claim correspondence?

It reads English, Dutch, Portuguese and Spanish out of the box. Container IDs and dates are language-agnostic. Free-text body content goes to a single multilingual prompt rather than a routed per-language one, which we found works better in practice than language detection.

What happens if the Sanco feed is delayed or down?

The deadline calculator falls back to gate-out date plus five days and flags the thread for human review. The operations lead sees an explicit estimate label, not a firm deadline.

Does the agent send replies on its own?

No. It drafts. The operations lead clicks send. A runaway auto-responder loop in week three convinced us that the draft-only boundary is the safe one, and we have kept it.

How long did the rollout take?

Two weeks of build, four weeks of supervised parallel running. We did not switch off the manual workflow until missed-deadline count had been zero for three consecutive weeks.

email automationai agentscase studyworkflowoperationsintegrations

Building something?

Start a project