Email automation
Email agent over a 13-year-old ERP: 740 RFQs in Enschede
On a Tuesday morning in March, the Enschede sales desk used to be 40 emails deep before coffee. Now an agent has read them, priced them, and parked them.

The 8:47 inbox
On a normal Tuesday in March, the sales desk at a 27-person industrial-textiles maker in Enschede used to open Outlook to 41 unread emails. Twenty-three were RFQs, mostly in German and Dutch, sometimes with a PDF of a part drawing or a tech-sheet for a heat-resistant coating. By 9:15 the two account managers had triaged the obvious ones. By 11:30 the trickier threads were sitting in a "needs ERP check" pile. The first real quote went out at 11:40. The last of the morning batch went out closer to lunch.
That was the baseline. Two hours and forty minutes from first email to first quote out the door, averaged across a full month of inbound RFQ traffic. The email agent we deployed in February took the same morning batch from 2h40 to 18 minutes. It did not miss a margin-floor override once in the three months we tracked it after go-live.
This is the story of how that worked and where it almost did not.
Why the inbox looked like that
The company makes technical textiles (heavy industrial belting, conveyor liners, abrasion sleeves) in batches of 20 to 600 metres for buyers across the DACH region and the Benelux. A typical RFQ has six variables: width, length, coating spec, cut pattern, lead time, and incoterm. The buyer rarely sends all six. Most threads start with two of the six and a vague reference to "same as last time" or "the spec we used for the Krefeld order".
The sales desk's real job was never typing the quote. It was reconstructing the spec from incomplete email plus customer history plus a peek at Ridder iQ to find the last shipment to that account. Ridder iQ is a Dutch ERP that has been the backbone of mid-size Benelux manufacturers since the early 2010s. Solid product. The deployment we walked into was on a 2013 schema with five layers of custom fields on the article master and a SOAP connector that nobody had touched since 2018.
The volume: 740 RFQ threads per week across six mailboxes. Roughly 60% repeat buyers, 30% adjacent variants of past orders, 10% genuinely new spec work that needed an engineer.
What the agent actually does
The pipeline is unglamorous. An incoming RFQ lands in a shared Microsoft 365 mailbox. A Graph subscription pushes the change notification to a queue. The agent does five things in order:
- Reads the thread and any attached PDFs, pulls out the six-variable spec, and fills the gaps it can ("same as last time" matched against the customer's order history).
- Hits the Ridder iQ SOAP endpoint to fetch the article master entries that match the spec, with current stock and standard cost.
- Computes a candidate quote using the price-book rules for that customer segment, the current FX rate for non-EUR threads, and the margin floor for that material class.
- Writes a draft reply in the original thread, in the original language, with the quote inline and the assumptions called out.
- Parks the thread in a "needs human signoff" folder if any of the five preceding steps was below the confidence threshold, or in a "ready to send" folder if all five cleared.
An account manager works the signoff folder first. The ready-to-send folder is reviewed in batch and released with one keystroke per thread.
Ridder iQ and the SOAP connector that almost killed it
The integration was the hard part. Ridder iQ exposes a SOAP API that is, charitably, of its era. Most fields you actually want for a price calculation are not in the default WSDL: the custom margin overrides, the customer-specific discount ladder, the "do not quote below X" flag. They live on extension tables added by the implementer in 2017, each with their own naming convention.
We ended up writing a thin Node service that wraps the SOAP client and exposes a clean REST surface for the agent. About 600 lines of TypeScript. Most of it is field-name translation.
// abn-ridder-bridge/src/articles.ts
import { createClientAsync } from "soap";
import { z } from "zod";
const ArticleSpec = z.object({
articleCode: z.string(),
customerCode: z.string().optional(),
width_mm: z.number(),
length_m: z.number(),
coating: z.string(),
});
export async function fetchArticleWithOverrides(
spec: z.infer<typeof ArticleSpec>,
) {
const client = await createClientAsync(process.env.RIDDER_WSDL!);
const [base] = await client.GetArticleAsync({
code: spec.articleCode,
});
const [overrides] = await client.GetCustomerOverridesAsync({
customer: spec.customerCode ?? "",
article: spec.articleCode,
});
return {
standardCost: base.StdKostprijs, // 2013 Dutch field name
marginFloorPct: overrides.MinMargePct, // extension table
discountLadder: overrides.KortingTrap, // extension table
listPrice: base.Verkoopprijs,
};
}
The field-name translation alone took two weeks. The 2013 schema uses Dutch abbreviations in places and English in others. The 2017 extension tables are their own dialect. The clean REST surface was non-negotiable: the agent has no business knowing what StdKostprijs means.
If you are integrating an agent against a legacy ERP, build the adapter first and treat it as a real product. The agent will get a model upgrade every six months. The adapter will be load-bearing for a decade.
The margin-floor override
The bit the sales desk cared about most was the override logic. Every material class has a margin floor: the percentage below which a quote needs explicit signoff, no matter what the customer history suggests. A quote that breaches the floor without signoff is the kind of mistake that, on a 600-metre order, can wipe out a month of profit on that product line.
The naive implementation, "if computed margin < floor, route to human", was not enough. The actual rule the desk used was layered: floor varies by customer segment, by raw-material lot (some lots have promotional pricing the supplier wants moved), and there are two named accounts that have a contractually fixed floor that overrides the segment default.
We encoded the full ladder in a single decision function. It returns one of four states: auto_send, review_quick, review_engineer, or block_with_reason. The agent is not allowed to send anything that is not auto_send. It is allowed to draft for the other three.
The reason this matters more than usual right now: there is a recurring pattern on engineering forums, of agents taking actions their operators did not authorise and cannot easily unwind. The lesson Anthropic's own building-effective-agents writeup keeps returning to is the same: keep the high-stakes decisions in deterministic code, and use the model where ambiguity is the actual problem. An RFQ reply is a quote. A quote in writing is, in most B2B contexts in the EU, binding enough to honour. The agent has to be wrong on the side of "I drafted, you press send" for every cent below the floor.
What 18 minutes actually means
The 2h40 to 18 minutes number is the headline, but it is worth being honest about what is underneath.
The 18 minutes is the time from inbox-arrival to outbound-sent, for the threads the agent classified as auto_send. That is roughly 62% of the morning batch. Another 28% land in review_quick and go out within an hour, because the account manager only has to glance at the assumptions and press send. The remaining 10%, the review_engineer and block_with_reason threads, still take half a morning. They always will.
The honest framing: the agent did not make the whole sales desk faster across the board. It made the boring two-thirds of the inbox close to instant, freed the account managers to spend their morning on the engineer-grade threads, and reduced the average customer's wait time from "before lunch" to "before the second espresso". The win is not "AI replaces the sales desk". The win is the sales desk spending its mornings on the 10% of threads that actually need a human, instead of the 90% that do not.
Three things that broke in the first month
The PDF parser was first. Buyers send tech-sheets in everything from clean vector PDFs to phone-photo scans of a printout. The agent's spec-extraction was 94% accurate on vector PDFs and 71% accurate on scanned ones. We added a confidence score per extracted field and demoted any quote with a scanned-PDF input to review_quick, no matter what the margin said. False precision is worse than no quote at all.
The "same as last time" lookup was second. The agent was, in its first week, matching on customer plus article code and getting the wrong order roughly 4% of the time. Customers reuse the same article code with different cuts. We rewrote the lookup to require customer plus article code plus at least one matching variable from the new RFQ (width, length, or coating). The 4% dropped to 0.3%, all of which were caught by the review_quick human pass.
The German salutation was third. The agent was, in week one, replying to buyers it had quoted for years with "Sehr geehrte Damen und Herren". The sales desk reads that as cold. We added a per-contact salutation cache populated from the last 18 months of sent mail, with a fallback to the formal default only for unknown contacts. Small thing. Customers noticed immediately.
What did not change
The pricing model. The customer relationships. The two account managers. The engineer. The Ridder iQ deployment. The folder structure in Outlook. The Tuesday morning sales standup, which still happens at 9:00 because the routine is the routine.
What changed is that the standup is no longer about clearing the inbox. It is about the threads in review_engineer. That is a better conversation to be having.
How to think about doing this on your own ERP
A short version, for the reader with a similar problem who is not going to hire anyone to solve it.
Build the adapter first. Whatever ERP you are on (Ridder iQ, Exact, Unit4, AFAS, SAP Business One, your custom 2011 PHP thing), the connector is the load-bearing part. The agent on top is replaceable. The clean REST surface in front of your ERP is what you will still be using in 2030.
Encode the override logic as a decision function, not a prompt rule. If the agent is the only thing that knows the margin floor, you cannot test it, you cannot audit it, and you cannot swap models without re-validating the whole thing. A pure function with unit tests is what stands between you and an expensive mistake.
Default to drafting. The agent should send nothing it is not confident about to the customer's threshold of confidence. Everything else is a draft for a human to release. The minute you push that threshold down, you are gambling with quotes in writing.
And measure the right thing. "Time saved" is the wrong metric for a sales desk because account managers will fill the freed hours with whatever lands in their inbox. The right metric is customer wait time from RFQ-received to quote-sent. That number is visible to the buyer and it is what they are actually shopping on.
One small thing to do today
If you run a quote-heavy inbox, spend twenty minutes this afternoon counting two numbers: how many RFQ threads landed last week, and what percentage of them were variants of a quote you had already sent. That ratio is the upper bound on what an email agent can take off your desk. If it is above 50%, the math is probably already in your favour. When we built the email agent for the Enschede shop, the ratio was 88%. We started writing the adapter the same week.
Key takeaway
Build the adapter first, encode the high-stakes rules as deterministic code, and let the agent draft everything it is not sure about.
FAQ
Does the agent ever send a quote on its own?
Only when the decision function returns auto_send, which requires high confidence on every extracted field and a computed margin comfortably above the floor. Everything else drafts and waits for a human.
How long did the integration with Ridder iQ take?
About six weeks end to end. Two weeks were spent on field-name translation against the 2013 schema and the 2017 extension tables. The agent layer on top took roughly ten days.
What model runs the spec extraction?
A frontier model with vision for the PDF pass, and a smaller model for the structured-field cleanup. The choice is abstracted behind the adapter so it can be swapped without touching the ERP code.
What happens when the model is wrong about a quote?
It almost always shows up as a low confidence score on one of the six spec fields, which routes the thread to review_quick. The block_with_reason state catches the rest before anything goes out.
Can this work for a smaller business with a simpler ERP?
Yes, and it is usually easier. The bottleneck on this project was the legacy SOAP surface. A REST-native ERP cuts the adapter work in half.