Legacy sites

Oracle Forms to chat agent: a read-only CDC playbook

The Oracle Forms backend was 22 years old. The Dutch municipality wanted a permit-status chat agent. We were not allowed to add a single row to the database.

Jacob Molkenboer· Founder · A Brand New Company· 15 Feb 2025· 10 min

Leather logbook with green ribbon, brass key on index card, iron shipping tag on ivory linen, dark shadows right.

The clerks at Gemeente X had a Tuesday-morning rhythm. Phone rings. Resident on the line. Wat is de status van mijn dakkapelvergunning? One clerk pulls up an Oracle Forms screen from 2004, tabs through six forms, types the BSN, waits for the green status band at the bottom, reads back a code. Then translates the code into plain Dutch. Then explains why it has been five weeks.

They wanted a chat agent. Residents would ask the question themselves, get an answer in plain Dutch, and only call when the agent could not help. The catch: the Oracle Forms backend was 22 years old, the database underneath was a hand-tuned 11g instance that nobody wanted to wake up, and the original vendor billed by the form. We were not allowed to add a single row to that database.

Three rules from week one:

The legacy database is read-only. No new writes, no schema changes, no triggers.
The agent must never invent a permit status.
The lag between Oracle and the chat answer stays under 90 seconds.

This is the wiring, end to end.

The constraint that shapes everything

Oracle Forms apps from the early 2000s usually talk to an Oracle database directly. CHAR(1) status fields. Padded numbers. Hand-rolled audit tables. Stored procedures that mutate state in five places. Refactoring is not the right word for what it would take to modernise. It is a political event, and the municipality was clear that it would not happen this budget cycle.

What we got instead was one concession from the DBA team: enable Oracle LogMiner and grant us a read-only role with SELECT_CATALOG_ROLE plus LOGMINING. No agent installed on the database server. No daemon. No cron. A JDBC connection over a site-to-site IPSec tunnel, that the DBA could revoke in thirty seconds.

That was the whole budget. Everything else had to live outside.

Debezium as the read-only door

The standard pattern here is change data capture. Debezium's Oracle connector reads the redo log through LogMiner and emits row-level change events. It runs entirely client-side from the database's perspective. We hosted it on a small Ubuntu VM (Java 17, Kafka 3.6, Debezium 2.5) inside our own subnet, dialling out to the Oracle SCAN listener.

The minimum config that gave us a clean stream:

name=permits-oracle
connector.class=io.debezium.connector.oracle.OracleConnector
database.hostname=oracle.gemeente.internal
database.port=1521
database.user=cdc_reader
database.password=${file:/run/secrets/cdc:pwd}
database.dbname=GEMPROD
database.pdb.name=GEMPDB
snapshot.mode=schema_only_recovery
log.mining.strategy=online_catalog
log.mining.batch.size.default=20000
log.mining.batch.size.max=100000
table.include.list=GEM.VERGUNNINGEN,GEM.VERGUNNING_STATUS,GEM.AANVRAGER
schema.history.internal.kafka.bootstrap.servers=kafka:9092
schema.history.internal.kafka.topic=schema-history.gem

A few notes from running this against real load. online_catalog is dramatically cheaper than redo_log_catalog, but only valid if the schema does not change underneath you. For a 22-year-old system that has not had a column added since 2017, that bet was safe. We watched the LogMiner session with a daily query against V$LOGMNR_STATS just in case.

The other knob worth tuning: log.mining.batch.size. The default is 20,000. In the Dutch permit-application peak (March through May, when every other homeowner files for a dakkapel or a terrace), we saw the redo volume jump by 6x. The connector kept up by raising the max batch size to 100k. Below that, we got minute-scale lag spikes.

Warning

Debezium's Oracle connector needs the redo log retained long enough to recover from a connector restart. If your DBA team aggressively rotates archive logs, agree on a 24-hour minimum retention window before you ship.

A 90-second mirror in Postgres

Kafka topics land in a Postgres 16 database with the pgvector extension. The schema is not a 1:1 copy of Oracle. We rebuild it for the queries the agent actually needs to make.

create table permit (
  id              bigint primary key,
  reference       text not null unique,        -- "OV/2024/00472"
  bsn_hash        bytea not null,              -- never store raw BSN
  type            text not null,               -- "dakkapel", "terras", ...
  submitted_at    timestamptz not null,
  status_code     text not null,               -- raw Oracle code
  status_nl       text not null,               -- plain-Dutch translation
  last_event_at   timestamptz not null,
  embedding       vector(1024)                 -- only on the free-text fields
);

create index permit_bsn_hash_idx   on permit (bsn_hash);
create index permit_reference_idx  on permit (reference);
create index permit_embedding_idx  on permit
  using hnsw (embedding vector_cosine_ops);

The consumer is a small Go service that reads the Debezium Kafka topic, normalises the row (Oracle's CHAR(1) status codes get translated into plain Dutch through a lookup table the municipality's communications team maintains), and upserts into Postgres. It is 240 lines of code and it has not been touched in four months.

One design choice worth defending: we do not store the raw BSN (Dutch social-security number) in the mirror. Residents authenticate via DigiD before they talk to the agent, and the agent fetches by bsn_hash using a per-session salt. The Oracle side still has the BSN in cleartext because that is how the original system of record works. The mirror is a read model. It does not need to know.

The 90-second freshness contract

The 90-second number is not a technical limit. Debezium gives us sub-second lag in normal operation. The 90 seconds is a contract with the resident: if the mirror falls behind, the agent stops answering and tells the user to call.

The check that enforces it:

-- Run on every tool call from the agent
select extract(epoch from (now() - max(last_event_at))) as lag_seconds
from permit
where last_event_at > now() - interval '24 hours';

If lag_seconds goes above 90, the agent's first tool call (the one that looks up the permit) returns a structured error and the agent reads from a fallback script: Het vergunningensysteem is op dit moment niet bereikbaar. Belt u alstublieft 14 0XX. We tested this by killing the Debezium VM during a Friday afternoon staging run. Time to fallback: 14 seconds. The agent never improvised a status.

Takeaway

A read model is not a cache. It is a contract about freshness. Write the contract down and let the agent fail closed when the contract breaks.

Embeddings only where they earn their keep

pgvector is in this stack for one job: matching a resident's free-text question (Wat is er met mijn vergunning voor de dakkapel aan de zijkant?) to the right permit record when they do not have the reference number handy. We embed three fields: type, status_nl, and the most recent comment text from the case file. That is it. No embedding of structured columns. No embedding of metadata.

The reason is cost and drift. Embeddings on short structured values produce false neighbours. A terras permit and a dakkapel permit can sit a hair's breadth apart in vector space even though they are semantically distinct in municipal law. We learned that during the pilot and switched to exact-match on type, with vector search reserved for the free-text comment field.

The agent never composes its own answer

The municipality's legal team had one non-negotiable: the agent could not produce a permit status from the model's own reasoning. Not even a hedged one. Under the EU AI Act, a public-sector tool that informs residents about administrative decisions sits in the higher-risk bucket, and the deployer, the gemeente in this case, is the one who has to answer for what it says. The model vendor is a footnote when a citizen takes a complaint to the ombudsman.

So the agent is forbidden from generating any factual status sentence. Every status answer has to come from a structured tool call that returns a record from Postgres, and the agent's job is to render that record into Dutch. If the tool returns nothing, the agent reads the fallback script. If the tool returns a record older than 90 seconds, same.

We enforce this at the prompt layer with forced tool use and a system instruction that the model is not allowed to compose status sentences from its own knowledge. The fallback script lives in a JSON file the communications team owns. They edit it without filing a ticket with us.

The wider principle: do not let the agent's confidence be the load-bearing wall. Make the data the load-bearing wall, and let the agent be glass.

What we watch

Three dashboards keep the pipeline honest. The first is the freshness query above, sampled every fifteen seconds and graphed in Grafana. The second pulls Debezium's JMX metrics: MilliSecondsSinceLastEvent, NumberOfCommittedTransactions, and the LogMiner-specific CurrentScn against the SCN at last commit. The third is an hourly reconciliation job that picks a hundred random permit references from the mirror, queries Oracle directly for the same hundred, and asserts the status codes match.

The reconciliation has never disagreed in production. We still run it, because the day it disagrees is the day we want to find out at 14:00, not from a resident at 17:30 on a Friday.

Alerting is boring on purpose. Lag above 60 seconds for more than two minutes pages the on-call. Lag above 90 at any moment trips the resident-facing fallback. A reconciliation mismatch pages immediately. The agent itself never knows that an engineer was paged. It only knows that its tool returned a structured error and it has a script to read.

What we would do differently

The 90-second window is more conservative than it needs to be. We could push it to 30 seconds without changing any code, because Debezium already runs sub-second in steady state. We left the window wide because the DBA team wanted headroom during quarterly Oracle patching, and we agreed it was cheaper to over-promise on lag than to fight that fight.

If we were starting again, we would also skip the embedding column for the first six months. Residents almost always know their reference number or their BSN. Vector search is the long tail. Build it after you ship.

The five-minute version

If you are sitting on a legacy backend that you cannot modify, and you need to put a chat agent in front of it, the shape is:

A read-only CDC feed from the legacy database (Debezium covers Oracle, MySQL, SQL Server, Postgres).
A purpose-built read model in a database you control, refreshed continuously.
A freshness contract that the agent enforces by failing closed.
An agent that only renders records, never composes them.

When we built this for the gemeente, the thing we had not anticipated was the political cost of asking the DBA team to enable LogMiner. It took six weeks of meetings before the connection went live. We ended up solving it by writing the role-grant SQL in advance and letting the DBA review every line before they ran it. That work, the legacy migration path that goes around the database rather than through it, is most of what we do at ABN now.

If you want the smallest thing you can do tomorrow: open your legacy database's documentation and find the CDC story. Oracle has LogMiner. SQL Server has Change Tracking. MySQL has the binlog. Postgres has logical replication. Whatever it is, that is your read-only door. The rest is plumbing.

Key takeaway

A read model is not a cache. It is a contract about freshness. Write the contract down and let the agent fail closed when the contract breaks.

FAQ

Why not just call the Oracle database directly from the agent?

Two reasons. Read latency on a 22-year-old Oracle Forms backend is unpredictable under load. And every direct query couples your agent's uptime to the DBA team's patching schedule.

Does Debezium need an agent installed inside Oracle?

No. The Oracle connector runs as a JDBC client. It needs a database user with LogMiner privileges and network access to the listener. Nothing is installed on the Oracle server itself.

How do you stop the chat agent from inventing a status when the mirror is stale?

A freshness check on every tool call. If the mirror is more than 90 seconds behind the redo log, the tool returns a structured error and the agent reads a fallback script written by the municipality.

Why pgvector instead of a dedicated vector database?

The mirror already needed Postgres for the structured permit records. Running one database instead of two cut operational surface in half, and pgvector's HNSW index is fast enough for our query volume.

legacy sitesai agentschat agentsmigrationarchitectureintegrations

Building something?

Start a project