← Blog

Email automation

Zalando RMA reconciliation: how an email agent saves €18k

Every Friday afternoon, the finance lead at a Tilburg fashion wholesaler used to write off Zalando credit notes she didn't have time to dispute. We built her an email agent.

Jacob Molkenboer· Founder · A Brand New Company· 10 Jun 2026· 10 min
Cream envelope tied with chartreuse ribbon on forest-green leather blotter, brass letter opener and folded receipt beside it.

It is 16:40 on a Friday. The finance lead at a 36-person fashion wholesaler in Tilburg has 47 Zalando credit notes open across two browser tabs, a Picqer return report on her second monitor, and a school pickup at 17:15. The credit notes total €4,200 in chargebacks. Some of them look wrong. She accepts all of them, closes the laptop, and walks out.

Multiply that Friday by 50 weeks and you have the order of magnitude we walked into when the company called us. The finance lead was not lazy. She was rational. There was no shape of work that fit 1,800 weekly RMAs into one person's calendar, so the rational move was to absorb the loss and ship next week's orders.

This is the story of the email agent we built to do that reconciliation, what it caught, and the parts that surprised us along the way.

The shape of a Zalando RMA email

Every weekday the client's returns@ mailbox receives a stack of automated Zalando messages. Each one represents a customer-initiated return that has now reached Zalando's warehouse, been inspected, and been billed back to the supplier. A typical message has a short body, a CSV attachment, and a PDF credit note. The CSV is the source of truth Zalando wants you to use: it has the RMA reference, the order ID, the items the customer sent back, a reason code (defective, doesn't fit, changed mind, and so on), and a per-line refund value.

The catch is that the CSV is Zalando's account of what came back. It is not a scan of what actually arrived in their warehouse. It is also not a check against what your warehouse shipped in the first place. Those are two separate joins, and they live in two separate systems. Whoever does the reconciliation owns both joins.

What Picqer knows that Zalando doesn't

The client runs their warehouse on Picqer, a Dutch fulfillment platform that is common across small-and-mid-sized e-commerce in the Benelux. Picqer tracks the outbound side (what you shipped, when, with which EAN) and, more usefully here, the return side: what physically came in the door, scanned by your team, condition graded, and photographed if the grade is anything other than new with tags.

That means Picqer can answer questions a Zalando credit note cannot:

  • Did the customer return three items, or just two?
  • Is the SKU on the credit note actually a SKU we shipped on that order?
  • When the credit note says "defective", did our return desk grade the item as defective, or as new with tags?

If you can answer those three questions before the credit note is auto-paid, you can dispute the wrong ones. If you can't, you write off the difference. That was the status quo at the client for two and a half years.

The join nobody wants to do by hand

The math is brutal. 1,800 RMAs a week is 360 a day. At 90 seconds per reconciliation (open the CSV, find the order in Picqer, eyeball the line items, decide), that is nine hours of finance time per day. The company has one finance lead. So the work does not get done, the credit notes get accepted, and the loss gets booked under "channel friction" on the P&L.

The cost of the loss showed up clearly only when we did the audit. Over six weeks of historic data, mislabelled credit notes accounted for €18,400 of write-offs per month. That is not 1,800 disputes a month. It is the slice where Picqer and Zalando disagreed by enough money to be worth disputing.

The audit itself took three days of pairing CSVs against Picqer exports in a spreadsheet. That is the exact work we wanted the agent to do, and doing it once by hand was the only way to learn which shapes of mismatch were common, which ones were costly, and which ones a model could be trusted to flag. We have started every agent project this way since. The shortest path to a useful agent is to spend a week being the agent.

Architecture

The agent has four jobs: pull the Zalando emails, extract structured data, fetch the matching record from Picqer, and write a dispute draft when the two disagree. Nothing it does is exotic. The value is in stitching the steps together and running them on every inbound message instead of every Friday afternoon.

The mailbox we read over IMAP. If you are choosing between revisions, RFC 9051 is the current IMAP4rev2 spec and is what we target for new builds. The extraction step is a single model call that takes the email body and the attached CSV and returns a typed record with the RMA reference, the order ID, the line items, and the credit total. The Picqer join is one API call.

For the extraction model the choice matters less than people expect. We use a small general-purpose model and only route genuinely ambiguous attachments to something larger. At 1,800 messages a week the per-call cost dominates the total bill, so we tune for cost before accuracy and let the reconciler downstream do the actual judgement. The reconciler is plain code, not an LLM, because plain code is what you want when the question is "does this number match that number".

async function fetchPicqerReturn(rmaReference) {
  const url = `https://${SUBDOMAIN}.picqer.com/api/v1/returns` +
              `?reference=${encodeURIComponent(rmaReference)}`;
  const res = await fetch(url, {
    headers: { Authorization: `Basic ${PICQER_KEY}` },
  });
  if (!res.ok) throw new Error(`Picqer ${res.status}`);
  const hits = await res.json();
  return hits[0] ?? null;
}

The reconciler is small. It walks the Zalando line items, finds the matching Picqer scan by product code, and records what disagrees.

function reconcile(zalando, picqer) {
  if (!picqer) {
    return { verdict: 'dispute', reason: 'no_return_received' };
  }
  const issues = [];
  for (const line of zalando.lines) {
    const scan = picqer.products.find(p => p.productcode === line.sku);
    if (!scan) {
      issues.push({ type: 'sku_not_received', sku: line.sku });
      continue;
    }
    if (scan.amount < line.quantity) {
      issues.push({
        type: 'quantity_mismatch',
        sku: line.sku,
        billed: line.quantity,
        received: scan.amount,
      });
    }
    if (line.reason === 'defective' &&
        scan.condition === 'new_with_tags') {
      issues.push({ type: 'condition_mismatch', sku: line.sku });
    }
  }
  return {
    verdict: issues.length ? 'dispute' : 'accept',
    issues,
  };
}

When the verdict is dispute, the agent drafts a reply to Zalando's partner inbox citing the RMA number, the Picqer return ID, and the specific line items in disagreement. The draft sits in the finance lead's outbox until she presses send.

Mislabelled credit notes, in three flavours

The €18,400 monthly recovery breaks into three failure modes. We expected one of them. The other two were the part nobody knew was happening.

Flavour one: condition disagreements. The customer returned a dress in pristine condition. Our return desk scanned it as new with tags. Zalando's inspection graded it as defective and charged us the wholesale value. About 38% of the recovery sits here. These are also the disputes Zalando is most likely to accept, because our scan log is timestamped, photographed, and signed by a named warehouse operator. The audit photo is the load-bearing piece of evidence; if you do not photograph graded returns today, start.

Flavour two: quantity inflation. The customer returns two items, the credit note bills for three. Sometimes this is a Zalando packing error (a stray item from an adjacent return got scanned into ours), sometimes it is a reason-code change mid-flight. About 31% of the recovery. These are dull, mechanical disputes that the agent files almost without judgement.

Flavour three: ghost SKUs. The credit note references a product code that we never shipped on the original order. This was the surprise. Some of these are EAN mappings going stale between systems (a SKU was reissued under a new code, but the old code is still on the original picking slip). Others are genuine Zalando errors. About 31% of the recovery, and the hardest to win, because you have to argue from the shipping manifest, not just the return scan. We found these only because the audit compared product codes against the original picking slip rather than trusting the return scan alone. If your reconciliation logic starts at the return scan you will never see them.

Warning

The hardest failure mode of an email agent is not getting things wrong. It is getting them silently right at low confidence. If the agent confidently accepts a credit note it should have flagged, you have no idea you lost the money. Build the audit log before you build the dispute draft.

Why a human still presses send

We do not auto-send disputes to Zalando. Three reasons.

First, the legal posture. An automated dispute is still a claim against another company, and that claim has the same liability profile whether a human typed it or a model drafted it. The agent can write the email. The company still authors it. The finance lead's review means a named person inside the business is the author of record on every claim that leaves the building, which is the position you want to be in if Zalando ever decides to push back on a pattern of disputes.

Second, silent drift is the failure mode that costs the most. If the agent quietly starts accepting credit notes it should have flagged, the loss reappears on the P&L and the agent's own logs look healthy. There is no error to alert on. The only reliable way we catch this is by keeping a human in the path. The finance lead does not read every accepted RMA, but she does read every dispute, which means she sees the agent's hardest outputs every working day. Drift gets noticed in week one, not in quarter one.

Third, there is a cultural reason. The finance lead was worried that the agent would replace her. The opposite happened. She now spends 30 minutes a day reviewing dispute drafts, which is the part of the job that requires judgement, and she gets her Friday afternoons back. The CEO put it plainly when we did the handover: the goal was never to remove the role, just the worst hours of it. The CEOs we meet who think otherwise tend to lose their best operations people inside a quarter.

Results after eight weeks

The numbers from the first eight live weeks, against a six-week pre-baseline:

  • 1,820 RMAs reconciled per week, average. The 20 extra over the original 1,800 are an artefact of growth in the Zalando channel, not the agent.
  • €18,400 recovered per month, average. The lowest month was €15,900, the highest €21,300 (a heavy Black Friday returns wave that processed late).
  • 4.1% of disputes rejected by Zalando. Lower than we expected. The Picqer scan log is more persuasive evidence than we gave it credit for.
  • Finance lead time on RMAs: 35 minutes a day, down from a Friday afternoon plus a Saturday morning every other week.
  • Agent cost: roughly €240 a month in model calls and infra, against €18,400 recovered. The ROI math is not subtle.

The thing we did not measure but should have is the morale lift. The finance lead now ends Friday with the credit notes settled, not absorbed.

When we built the email-agent for this Tilburg client, the part that surprised us was how often the dispute won on the strength of the Picqer scan alone. The lesson we carried into the next two AI agent projects was to start by writing the audit log, not the dispute draft. Everything else falls out of being able to prove what actually happened.

The small thing to do today

If you process Zalando returns at any meaningful volume, pull a six-week sample of credit notes against your warehouse system and compare them by hand. Do not build anything yet. Just count the disagreements and price them. At this client the disagreement rate was around 11% by volume and 4% by value. Most teams we have done this exercise with land somewhere between 6% and 14% by volume, with the value-weighted slice roughly a third of that. Once you have those two numbers, you will know whether an agent is worth building or whether you are already inside Zalando's tolerance band.

Key takeaway

The point of an email agent is not to remove the human, it is to put their judgement where it earns the most money. Here, that was every dispute draft.

FAQ

Do we need Picqer specifically, or will any WMS work?

Any warehouse system that exposes inbound return data through an API will work. We have built variants on Channable, Logic4, and a custom MySQL backend. Picqer is just what this client happened to run.

How long did the build take?

Three weeks from kickoff to live. One week on the historic audit so we could price the problem, two weeks on the agent itself plus the dispute templates and the finance-lead review UI.

What happens if the agent drafts a wrong dispute?

Zalando rejects it and pays the original credit note. There is no double charge and no penalty. The downside of a bad dispute is one wasted email, not a fine, which is part of why the ROI math works.

Could you auto-send the disputes?

Technically yes. We do not, because the daily human review catches model drift early and because the named finance lead carries authorship of every claim that leaves the building.

Does this work for Bol.com or About You returns too?

Yes, with different parsers. The architecture (read mailbox, extract, join against your WMS, draft dispute) is identical. The credit-note formats and the dispute endpoints are what change per marketplace.

ai agentsemail automationautomationprocess automationcase studyintegrations

Building something?

Start a project