Security

AI agents in production databases: a pre-flight checklist

Before any agent at ABN reads a single row from a client's production database, it passes a written pre-flight checklist. This is that list, and the gotchas that earned each line.

Jacob Molkenboer· Founder · A Brand New Company· 4 Jun 2026· 9 min

Closed leather logbook with brass key, cream index card, green ribbon, and red wax fragment on ivory paper.

It is a Tuesday and a client wants their new support agent live by Friday. The agent needs to answer questions about order status, shipping windows, and stock levels. Marketing has the launch email queued, operations has a Slack channel ready, and the only thing missing is a DSN for the production database.

This is the point at which we slow down.

The point of a written checklist

We have fourteen agents in production. Five of them touch a live database directly. The rest go through an API or a snapshot. Every time we plug a new one in, we run the same checklist. It is not because we trust ourselves less than we used to. It is because the failure modes here are quiet ones. A wrong scope on a credential will not crash. A missing statement timeout will not raise an alert. The first time anyone notices is when a customer sees somebody else's invoice quoted back in a chat reply.

A written checklist also forces the question nobody wants to ask in a kickoff: who pays for it if the agent does something wrong on Monday morning. When the answers are in a document, they can be reviewed before launch, not after. We pin the checklist in the agent's repo and require a tick on every line before a credential leaves the secret store.

Credentials, identity, and least power

Every agent gets its own database role. Never the application role. Never a shared analytics role. Its own role, named for the agent, with its own password, rotated on its own schedule.

The role gets SELECT on the views the agent is allowed to read, and nothing else. Not on the underlying tables. Views give us a place to filter columns and rows before the agent sees a thing. Permissions on schemas are revoked by default, then granted back narrowly.

-- order-agent role: read-only, view-scoped, statement-bounded
CREATE ROLE order_agent LOGIN PASSWORD :'pw';

REVOKE ALL ON SCHEMA public FROM order_agent;
GRANT USAGE ON SCHEMA agent_views TO order_agent;
GRANT SELECT ON agent_views.orders_safe   TO order_agent;
GRANT SELECT ON agent_views.shipments_safe TO order_agent;

ALTER ROLE order_agent SET statement_timeout = '3s';
ALTER ROLE order_agent SET idle_in_transaction_session_timeout = '5s';
ALTER ROLE order_agent SET lock_timeout = '1s';

The role's password lives in the agent's own secret. The application's role lives in the application's secret. If somebody compromises one, they have not compromised the other. Rotation runs on a thirty-day cadence by default, faster if the agent has shipped a new tool that quarter.

Scope of the access

Every column the agent can read is a column the agent can leak. So we draw two lines.

The first line is the column line. The view exposes the columns the agent's job actually requires, no more. If the agent answers questions about shipping windows, it does not need the customer's tax ID. If it answers about order status, it does not need the rep's internal discount note. The list is short, written, and reviewed in a pull request before the view is created.

The second line is the row line. Postgres row-level security gives us a clean way to say "this agent can only see rows owned by the tenant in this session variable". When the agent connects on behalf of a tenant, the policy enforces the boundary in the database, not in our application code. The PostgreSQL row security policies docs are short and worth the ten minutes.

We also write a denylist test. A pytest file that, for every agent role, runs SELECT on every sensitive table the agent is not supposed to read, and asserts a permission denied error. That test runs in CI on every change to the schema. A schema migration that quietly grants the agent something new will fail the build, not the customer.

Query shape and budget

A loose agent can hold a connection open for a long time, run an expensive join, and bring a production cluster to its knees. None of this requires malice. A prompt that says "summarise everything about this customer over the last year" is enough.

So we set budgets per role at the database level. Statement timeout of two to three seconds. Idle-in-transaction timeout of five seconds. Lock timeout of one second. A connection pool dedicated to the agent, with a hard maximum well below the application's pool.

The point of the pool boundary is that even if the agent runs hot, the application's pool keeps working. The order page stays up. The checkout stays up. The agent is the thing that fails, and that is a failure we can recover from cleanly. The pool gets its own dashboard panel and its own alert thresholds, so a noisy agent is visible inside thirty seconds rather than after the on-call complains.

Observability and the per-query receipt

Every query the agent runs gets logged with three things: the SQL, the parameters, and the model turn that produced it. Without the third item, an audit is useless. You see a strange query in the log and you cannot tell whether the user asked for it, the model invented it, or a prompt injection from inside the database told the model to write it.

We store these receipts in a separate database (not the production one) and we keep them for ninety days minimum, longer if the client's compliance position requires it. They are what we send to a customer when they ask "what did your agent do with my data on the third of June". A surprised client wants an answer in minutes, not a week of grepping.

Alerts go on the obvious shapes. A query that returns more rows than expected. A query that touches a column on the denylist. A query shape the agent has not run before, this week, on this tenant. None of these stop the query. They tell us about it, and we triage in the morning.

Prompt injection as the operating baseline

Academic work on self-propagating prompt-injection attacks against agent stacks (the so-called "agent worm" demonstrations published over the last year) makes a point that sounds alarmist until you sit down and try to design against it: any string the agent reads from anywhere might be an instruction. An attacker writes a prompt inside a document or a database row, the agent reads it, the agent acts on it, and the attacker is now steering the agent's tools.

The honest position is that customer notes, support ticket bodies, product descriptions, anything a human typed somewhere and a different human did not sanitise, is data shaped like text and nothing more. The OWASP LLM Top 10 calls this LLM01 and it is the first one for a reason.

We design around three assumptions:

The agent cannot trust the content of any row as instructions. Tool calls and structured outputs are the only path to action. Freeform text in a row is treated as data.
The agent's tools have their own authorisation. Just because the agent thinks it should refund an order does not mean it can. The refund tool checks the actor, the order, and the amount on its own.
Side effects (writes, emails, webhooks) require a confirmation turn that includes the user's original ask, not the agent's interpretation of it.

This is closer to how a junior employee gets given the company card. You do not let them spend it because the customer was persuasive. You let them spend it because the rule said yes.

# Refund tool: authorisation is the tool's job, not the model's job
def refund(order_id: str, amount_cents: int, actor: User) -> Refund:
    order = orders.get(order_id)
    if order.customer_id != actor.customer_id:
        raise Forbidden("actor does not own order")
    if amount_cents > order.refundable_cents:
        raise BadRequest("amount exceeds refundable balance")
    if amount_cents > actor.refund_ceiling_cents:
        raise NeedsApproval("over ceiling, escalate")
    return payments.refund(order, amount_cents, reason="agent")

Warning

Never paste a database row straight into a system prompt without delimiters. The "ignore your previous instructions" trick still works in 2026 when the row is concatenated into a template without boundaries the model can recognise. Wrap untrusted content in a clearly marked block, and tell the model in advance that anything inside that block is data.

Kill switches, rollback, and a real dry run

Two things have to be ready before launch.

The first is a kill switch. A single environment flag, or a feature flag, that takes the agent offline within thirty seconds and routes traffic to whatever the previous behaviour was. The kill switch is tested in staging before the launch and then again in production on launch day. If the only person who knows how to disable the agent is on a plane, the kill switch does not exist.

The second is a dry run on real traffic shape. We point the agent at a staging mirror with anonymised but production-shaped data, and we replay the last week of real user questions through it. We grade the outputs against a small expectations file. Anything below the threshold means we are not going live on Friday. We are going live on Tuesday after we have figured out why.

The checklist itself

We keep the actual list in a markdown file in the agent's repo. Here is the shape, redacted of client-specifics:

Dedicated database role created, password rotated, kept in the agent's secret store only.
Role has SELECT on agent views only. Denylist test green in CI.
Row-level security policy in place. Tenant scope enforced in the database, not in the app.
statement_timeout, idle_in_transaction_session_timeout, and lock_timeout set on the role.
Dedicated connection pool, max connections set, monitored separately from the app pool.
Query log table provisioned in a separate database, with model-turn correlation.
Alerts wired for high-row-count, denylist-column, and novel-query-shape events.
Refund, write, and email tools have their own authorisation rules, independent of the model.
Row content is treated as data, never as instructions. System prompt delimiters in place.
Kill switch tested in staging and in production.
Dry run on a week of replayed real traffic, graded against an expectations file.
Incident runbook on the on-call doc, with the kill-switch command at the top.

Twelve lines. On most projects the list fits on one A4 page. The discipline is not in the length. It is in not skipping a line because Friday is close.

Where this came from

When we built the operations agent for a Dutch fulfilment client earlier this year, the line that bit us was the row-scope one. The view held the tenant boundary, but the agent occasionally needed to answer a question that crossed it. We ended up giving the agent a second, separate "anonymous aggregate" role, with a hard-coded aggregation query rather than a free DSN. That kind of choice is what most of the work on an AI agents project actually looks like: not the model, not the prompt, but the boundary you draw around what the model is allowed to touch.

If you have an agent on the roadmap and you are about to point it at production, the smallest useful thing you can do today is open the database, list the columns the agent is supposed to read, and write the view. The rest of the checklist starts to make sense once that view exists.

Key takeaway

Give every agent its own database role with SELECT on purpose-built views, set statement timeouts, log every query alongside its model turn, and test a kill switch.

FAQ

Why not give the agent the application's database role?

Different roles fail differently. If the agent's credential leaks, you do not want it carrying write access to the rest of the application. Separate role, separate password, separate rotation.

How short should the statement timeout be?

Two to three seconds is usually fine for a customer-facing agent. The query budget exists to fail fast under a runaway prompt, not to optimise. Tune up only when a real query needs more.

Do views replace row-level security?

No. Views filter columns cleanly, but row scope is safer enforced in the database with RLS. The view and the policy do different jobs and you usually want both.

What goes into the per-query receipt?

The SQL, the bound parameters, the model turn that produced it, the actor, and the tenant scope. Without the model turn, you cannot tell whether a strange query came from a user, the model, or a row.

ai agentssecurityarchitectureoperationsintegrationsautomation

Building something?

Start a project