Process automation
Process automation for Navision 2009: a medtech case study
A Groningen medtech distributor runs on Navision 2009 and reconciles 2,180 weekly POs across two EDI formats. Here is the process-automation agent that drew the line.

It was 16:40 on a Friday in Groningen and Henk had two screens open. On the left, a stack of purchase orders that had landed via EDI overnight. On the right, a Navision 2009 session he had inherited from a colleague who retired in 2019. The match rate that afternoon was 91%. The other 9% were the reason he was still in the office.
This post is about the 9%, and the process-automation agent we built to make them visible without taking the decisions away from him.
A 14-year-old ledger that still runs the business
The client is a 31-person medtech distributor north of the city. They source Class IIa and IIb medical devices from a handful of upstream vendors and ship them to Dutch hospitals and care homes. Their ledger is Microsoft Dynamics NAV 2009, customised over a decade by two consultants who are no longer reachable.
Mainstream support for that version ended in 2014. Extended support closed in January 2024, per Microsoft's own lifecycle page. The team knew. They had scoped a migration to Business Central twice and shelved it twice. The customisations are heavy. The chain of receipts goes back three audit cycles. The cost of breaking that chain was always higher than the cost of running an EOL stack for one more year.
If you have ever been the operations lead at a small distributor, you know this calculation. The migration is always next year, and next year is busy.
2,180 purchase orders a week, two formats, one ledger
Throughput is 2,180 purchase orders a week. Two upstream sources do most of the volume. The consumables wholesaler sends EDIFACT D.96A for the medical kit. The foodservice distributor, who supplies the catering line that goes to care homes alongside the devices, sends a flat CSV with a 14-column header that has not changed since 2016, except for the day someone added a VAT column without telling anyone downstream.
Items are matched on a hybrid key. Some lines carry a GTIN. Some carry a vendor SKU. Some carry both, but the SKU drifted in 2021 when one product family was renumbered upstream and nobody backfilled the historical lines. So the reconciliation is not a join. It is a scored match, and the score sometimes lies.
The MDR layer nobody can skip
This is where a normal three-way match becomes interesting. The distributor moves regulated devices. Under the EU Medical Device Regulation (2017/745), every device that crosses a Dutch loading dock has a Unique Device Identifier that must be traceable from manufacturer to end user. A mismatch between a goods-receipt line and a wholesaler PO is not an admin error you fix on Monday. It is an audit finding waiting to be discovered three weeks later, when the lot number is already at a hospital and the regulator wants to know which carton went where.
So the reconciliation is two jobs in one trench coat. The first job is financial: match the PO line to the goods receipt to the invoice and propose a journal entry. The second job is regulatory: when a line does not match, decide whether the mismatch is a price or quantity discrepancy (controller's queue) or a UDI, lot, or batch discrepancy (quality officer's queue). Those two people answer to different auditors, and a single queue would bury one inside the other.
The agent we built, and the line we drew
We started by giving the agent read-only access. Nothing else. The Navision 2009 SQL Server sits inside their network. We exposed it through an ODBC connection scoped to four views: Purchase Header, Purchase Line, Item Ledger Entry, Vendor Ledger Entry. No write permission. No stored-procedure permission. The agent could read the world. It could not change it.
That decision is the spine of the project, and we will come back to why.
The matching logic is plain code, not a model call. Models are useful for the messy text on a supplier invoice PDF, but the join itself is deterministic:
def match_po_to_receipt(po_line, candidate_receipts):
"""
Returns (best_match, confidence, route).
Confidence below 0.92 routes to the controller's queue.
UDI mismatch always routes to the quality-officer queue,
regardless of financial confidence.
"""
scored = []
for r in candidate_receipts:
s = 0.0
if r.item_no == po_line.item_no:
s += 0.50
if r.lot_no and r.lot_no == po_line.lot_no:
s += 0.25
if abs(r.qty - po_line.qty) < 0.001:
s += 0.15
if r.unit_price and within(r.unit_price, po_line.unit_price, 0.02):
s += 0.10
scored.append((r, s))
if not scored:
return None, 0.0, "no candidates"
best, conf = max(scored, key=lambda x: x[1])
if best.udi and po_line.udi and best.udi != po_line.udi:
return best, conf, "udi-mismatch -> quality-officer"
if conf < 0.92:
return best, conf, f"low-confidence ({conf:.2f}) -> controller"
return best, conf, "auto-proposed (awaiting sign-off)"
Three things matter about that code. The score never invents a field: if a row has no UDI, the UDI rule contributes nothing rather than guessing. A UDI mismatch overrides high financial confidence, because a perfectly priced, perfectly counted carton with the wrong UDI is the most dangerous row in the file. And the highest possible route is auto-proposed, never auto-posted.
Models as translators, never actors
Behind the matching logic sits an ingestion pipeline that has to deal with the fact that invoices, unlike POs, do not arrive in EDIFACT or CSV. The consumables wholesaler sends a PDF that has been laid out the same way for nine years and is a delight to parse. The foodservice distributor sends a PDF whose layout changed twice in 2025 after their finance team rolled out new templates with no advance notice. For those documents we do use a model, not for the journal, but to extract line items into the same shape the deterministic matcher expects.
The model output goes into a staging table that the matcher reads. It never touches the ledger directly. If the model gets a line wrong, the worst case is a low-confidence match that routes to the controller's queue, which is the same place wrong lines went before. The pattern is worth naming. When a model sits behind a deterministic stage, the model is a translator, not an actor. It can be wrong; the cost of being wrong is bounded; the human at the end of the pipeline catches what the model missed. That separation is the spine of any process automation we run against a legacy ledger.
Two queues, two auditors
The controller's queue is a side-by-side view: proposed journal entry on the left, source documents on the right, three buttons (sign, reject, edit). She can clear ninety lines in an hour because the agent has already done the boring matching work and her job is to read the eight that are wrong.
The quality officer's queue is structured around the device, not the journal. Each row is a UDI, a lot number, a batch, and a question. The wholesaler says this lot shipped on Tuesday. We received seven cartons. Six match the manifest. The seventh has a UDI that resolves to a previous-generation product code. Decide. He decides. The agent writes an append-only audit-trail entry. He signs. The associated journal proposal then routes back to the controller for financial sign-off.
Two queues, two signatures, one ledger. The agent moves the work; the humans take the decisions that have audit weight.
Why the agent never posts a journal line on its own
This is the line we drew, and it deserves an explanation, because every two weeks somebody asks us to move it.
There is no shortage of recent stories about AI agents that took irreversible actions against cloud APIs, production databases, or source repositories, and left the humans who deployed them with no path back. The pattern is consistent across vendors and stacks, and so is the signal: an agent that takes irreversible actions is an agent that can ruin your week on a Tuesday.
A journal line in Navision is irreversible. You can reverse it with another journal, but the original stays on the ledger, and a Dutch auditor will see both. The cost of an agent posting one wrong journal is higher than the cost of a controller signing off on three hundred right ones. The arithmetic does not even need a calculator.
In any system where the wrong action leaves a permanent record, the agent proposes and the human disposes. There is no upside to letting it post on its own.
The framing we use with clients: an agent's value is the work it removes, not the responsibility it absorbs. The controller's job before our agent was 80% matching and 20% deciding. After, it is 5% reading and 95% deciding. Her throughput tripled. Her audit exposure did not move a millimetre.
The first six weeks, which were ugly
We should be honest about the rollout. The first six weeks were not a smooth glide into productivity. Henk did not trust the agent. He should not have trusted the agent. We had built it; he had not. So the first six weeks ran as a parallel system. The agent proposed; he matched by hand; we compared.
On day one, the agent agreed with him 81% of the time. The 19% gap was almost entirely the SKU drift from the 2021 renumbering, which we had not yet taught the matcher about. By week four the agreement was 96%, and the disagreements were mostly cases where the agent was right and Henk had been doing it the wrong way for two years out of habit.
The point of the parallel run was not to validate the agent. It was to give the controller a basis for trusting the queue she was about to start signing. Without that, every sign-off would have carried the cognitive load of a fresh decision. With it, by the time the queue went live, she trusted the score because she had spent six weeks watching it correlate with her own judgement.
What changed at the desk on Friday
The agent has been in production since January. Last week's numbers: 2,180 POs reconciled, up from roughly 1,940 a year ago, because the team is no longer skipping the long tail to clear the urgent queue. Mean controller review time per batch of ninety proposed entries fell from 47 minutes of manual matching to 8.2 minutes of reading and signing. In the first quarter the queue caught fourteen UDI mismatches, of which two were genuine MDR findings that would otherwise have shipped to hospitals. The number of journal entries posted without a signed controller sign-off is zero, and it will stay zero.
Henk goes home at 16:40 on Friday now. The 9% are still there. They are just on his screen in the morning, sorted, scored, and waiting for a decision instead of a search.
The smallest thing you can copy this week
If you run finance against an EOL ledger and the migration budget keeps slipping a quarter, the place to start is not the migration. Inventory the reconciliation work first. Count the lines, count the queues, and count the irreversible actions. Anywhere a person is doing matching work that a deterministic script could do, that is automation territory. Anywhere a person is making a judgement call with audit weight, that stays human. The agent lives between them, and its mandate ends at the sign-off page.
When we built the process automation for the Groningen distributor, the temptation we kept returning to was letting the agent close the loop on the easy 91%. We chose not to. The ledger is older than the youngest person in the warehouse and it will outlive the agent. The agent's job is to prepare the work, not to do it.
Key takeaway
In any system where the wrong action leaves a permanent record, the agent proposes and the human disposes. There is no upside to letting it post on its own.
FAQ
Why didn't you migrate them off Navision 2009 instead of automating around it?
Migration to Business Central was scoped twice and shelved twice on cost and risk. The agent is cheaper, removes the operational pain that was driving migration appetite, and lets the team migrate on their own schedule rather than under deadline pressure.
How does the agent avoid fabricating matches when source data is dirty?
The match score only credits fields that exist in both records. If a row has no UDI or no lot number, the rule for that field contributes zero rather than guessing. Low totals route to a human queue instead of a confident-looking wrong answer.
What happens during an MDR audit?
The append-only log gives the auditor a per-line trail: who saw each mismatch, who decided, when, and against which source document. Auditors have so far preferred this trail to the previous spreadsheet-based one.
Could the same pattern work for other legacy systems, not just Navision?
Yes. The pattern is read-only ingest, scored deterministic match, two queues for two kinds of decision, and human sign-off on every irreversible action. The substrate matters less than the rule that the agent never takes the action itself.
How big does a team need to be before this pays for itself?
The break-even is not headcount, it is the cost of one bad ledger entry. Anywhere a wrong journal triggers an audit response, a refund, or a regulator letter, an agent that proposes rather than posts pays for itself the first time it catches a mismatch.