Process automation
LIMS reconciliation agent: a freezer-inventory playbook
A 19-person Leuven biotech was reconciling 1,420 freezer samples a week by hand across LabVantage and Access. Here is the agent we built, and the GxP queue that kept it audit-safe.

The wet-lab manager pings us at 16:47 on a Friday. "Freezer #7 says we have 38 vials of CHO-K1 lysate. LabVantage says 41. The Access sheet says 39. Which one do I trust before the weekend?"
Multiply that single message by 1,420 sample rows a week and you have what every Friday at this 19-person Leuven biotech looked like before we got involved. Two scientists routinely stayed until 21:00 reconciling freezer counts across a 16-year-old LabVantage LIMS, a homegrown Microsoft Access database from 2011, and an iPad log that the bench tech updated by hand when she remembered to.
This post is the playbook for the agent we shipped to handle that reconciliation. The interesting part is not the matching code. The interesting part is the queue we built to stop the agent from ever writing to the GxP-audited tier without a human in the loop.
The reconciliation problem at a 19-person biotech
A quick inventory of the systems in play, because the story does not work without them.
- LabVantage LIMS, on-prem, Oracle backend, last major upgrade in 2018. Holds the canonical sample record for anything that ever touches the clinical-grade workflow.
- Access database, built in 2011 by a postdoc who left in 2018, hosting 14 years of derivative-assay data and a forest of VBA macros that nobody fully maps.
- Freezer log, a shared iPad app written in FileMaker that the bench scientists update after each pull or deposit.
The lab cannot drop the Access database. It cannot rip out LabVantage either. The LabVantage license alone runs about €38k a year, and the vendor's migration estimate came in at 18 months and seven figures. We were asked to make the three systems agree with each other on a 15-minute cron, not to replace anything.
1,420 weekly entries was the average across the prior 12 months. Two FTEs were each spending around six hours a week reconciling. That is 12 person-hours a week, or roughly €31k a year in fully-loaded salary, eaten by spreadsheet diffing.
Why two systems still exist
It is tempting to write "the right answer is to consolidate onto one system" and move on. We tried. The Access database is a moat of institutional knowledge. There are macros in it that take a 12-column assay export, derive seven QC metrics, and feed them back into a worksheet the wetenschappelijk directeur (scientific director) reviews on Monday morning. Anyone who has tried to port Access VBA to a modern stack knows this is a months-long project where you discover an undocumented rule on week eight that breaks every assumption from week one.
The LIMS, meanwhile, is where any sample destined for a clinical-grade preparation lives. The compliance tier matters. FDA 21 CFR Part 11 and EMA Annex 11 both require electronic records in this tier to carry attributable, contemporaneous, original, and accurate audit trails. An automated process is allowed to write into that tier, but each write needs a verifiable human approval recorded against it.
So the brief was narrower than "fix the data". It was: reconcile everywhere we are allowed to, and route everything else to a human queue.
The pipeline shape
We landed on a four-stage pipeline. It runs every 15 minutes. The whole thing is Python, about 1,800 lines, deployed as a single container on the lab's existing on-prem Proxmox host. The data does not leave the building.
- Pull. A read-only Oracle connection scrapes the LabVantage
SAMPLEandSTORAGE_LOCATIONtables. A separate ODBC connection reads the Access tables. The iPad freezer log syncs to a SQLite mirror on the same host via a 5-minute pull. - Normalize. Sample IDs are written three different ways across the systems. We canonicalise them in a Postgres staging schema.
- Match. The agent groups rows that refer to the same physical vial and produces a diff per vial.
- Route. Each diff goes to one of three buckets: auto-apply, human-review, or clinical-quarantine. Only the first writes back without a human signature.
Normalising the sample IDs
LabVantage stores samples as CHO-K1-001. The Access database uses CHOK1_0001. The freezer log is whatever the bench scientist typed, which sometimes is cho k1 1 at 18:00 on a Friday. Before any matching can happen, the IDs have to land in one shape.
import re
CELL_LINE = re.compile(r"^([A-Za-z0-9]+?)[\-_\s]*(K?\d+)[\-_\s]*0*(\d+)$")
def canonical_id(raw: str) -> str | None:
s = raw.strip().upper().replace(" ", "")
m = CELL_LINE.match(s)
if not m:
return None
line, variant, serial = m.groups()
return f"{line}-{variant}-{int(serial):04d}"
assert canonical_id("CHO-K1-001") == "CHO-K1-0001"
assert canonical_id("CHOK1_0001") == "CHO-K1-0001"
assert canonical_id("cho k1 1") == "CHO-K1-0001"
For anything that does not match the regex (legacy IDs from before the 2014 naming convention) we fall back to a fuzzy match using the pg_trgm extension in Postgres. Threshold 0.78, picked after testing against 800 hand-labelled examples. Anything below the threshold goes straight to the human-review bucket. We do not let the agent guess on legacy IDs.
The clinical-grade reagent queue
This is the part of the build that took the longest to design and the least time to code.
LabVantage carries a flag on each sample called REGULATORY_TIER. Values are RESEARCH, CLINICAL_GRADE, and GMP. The agent will not write to a row with CLINICAL_GRADE or GMP until the wetenschappelijk directeur has approved the diff.
The approval surface is the smallest interface we could ship. It is a single web page, hosted on the same on-prem container, listing pending diffs grouped by sample. Each row has a green check, a red cross, and a free-text field for an audit-trail comment. The check button writes back to LabVantage and stamps the row with the director's e-signature. The cross writes a reject record. Both actions are logged, immutable, and exported nightly to the lab's audit storage.
Never let the agent decide what counts as clinical-grade. Read the flag from the system of record. We had a tempting shortcut where we inferred the tier from the protocol code (anything in the CL-XXX series is clinical). It worked for 1,418 of the first 1,420 samples. Sample 1,419 was a research-tier vial reused in a clinical protocol with an old code. That single misclassification could have invalidated a batch record.
What broke in week three
Three things broke that we did not predict.
The Access database locked. Access opens an .ldb lock file when any client is connected. Our ODBC reader was holding the connection for the duration of the cron run, which was roughly 90 seconds. During that window, the postdoc trying to update the QC worksheet got a sharing violation. Fix: switched to a snapshot read, dropped the connection inside 4 seconds, and shifted the cron to start at :02, :17, :32, :47 to dodge the manual save patterns the lab had grown into.
LabVantage rate-limited us. The web-services tier on the old LabVantage build has an undocumented soft cap around 60 writes per minute. The first week of full operation, the agent batched 312 reconciled writes at once on Monday morning. The middleware quietly dropped the tail. We added an exponential backoff and a 50-write-per-minute hard ceiling on our side, with persistent retry across cron runs.
Staging table bloat. We were truncating the Postgres staging table after each successful run. After a month, dead tuples and index bloat had made the cron job slow enough that runs overlapped. We re-built the staging table as a partitioned table keyed by cron-run timestamp, then dropped (not deleted) old partitions after retention. Same result, no bloat, no autovacuum drama.
Operational shape after 90 days
The agent has been running for 14 weeks at the time of writing. Numbers from the last full month:
- 1,418 samples reconciled per week on average. The volume held steady.
- 87.4% auto-applied without human review (research-tier, high-confidence match).
- 9.1% routed to the human-review bucket for any tier.
- 3.5% routed to the clinical-grade queue for director approval.
- Median time from diff detection to director approval: 4 hours 12 minutes, driven by working hours rather than the queue.
- FTE hours recovered: 11.5 per week. The two scientists got their Friday evenings back.
The single number we care about most is the false-merge rate. To date it is zero, measured against a weekly spot audit of 40 sampled diffs. Not because the matcher is perfect, but because the legacy-ID fallback and the clinical-grade queue catch the cases where the matcher would have been wrong.
What we would change next time
Two things, both about the interface, not the engine.
First, we would build the approval queue mobile-first. The wetenschappelijk directeur reviews diffs on her phone between meetings far more often than at her desk. We shipped a desktop layout first and rebuilt it mobile-first in week six.
Second, we would make the reject-comment field mandatory from day one. Half of the early rejects came back blank, which meant we could not retrain the auto-apply confidence thresholds. The field is now required, and the comments feed a weekly review of where the matcher is being conservative without reason.
The smallest thing you can do today
If you run a lab, an ops team, or any business with two systems of record that disagree, do not start with the integration. Start with the queue. Build a single page that lists the diffs, and let a human approve or reject each one for a week. The rules you write down in week two are the ones that survive the audit in year two.
When we built the freezer reconciliation agent for the Leuven biotech, the part we kept rewriting was not the matcher but the routing logic that decided which diffs the agent was allowed to touch on its own. That kind of process-automation work only earns trust by deferring early and often.
Key takeaway
Start with the human approval queue, not the matcher. The routing rules are what earn the trust that survives the audit.
FAQ
How long did the build take?
Eleven weeks from kick-off to full operation. Most of that was approval-queue iteration and edge-case discovery, not the matching engine itself.
Can the agent write directly to a GxP-validated system?
Only to research-tier rows. Anything tagged clinical-grade or GMP routes to a human approval queue so the e-signature requirements under 21 CFR Part 11 and EMA Annex 11 are satisfied.
What stack did you use?
Python, Postgres with pg_trgm and table partitioning, plus Oracle and ODBC clients for the legacy systems. Deployed as one container on the lab's on-prem Proxmox host. No cloud.
How do you avoid matching errors on legacy sample IDs?
Anything below a 0.78 trigram similarity score goes straight to a human-review queue. The agent does not guess on the long tail of pre-2014 naming conventions.