RAG
RAG cross-tenant leak: the row-level policy that fixed it
A tenant asked about late fees and got a paragraph from a contract that was not theirs. Seven lines into the incident report, the cause was clear: retrieval without a tenant fence at the database.

11:14 on a Tuesday
The ticket said: "Why is your bot quoting our competitor's payment terms back at us?"
Attached was a screenshot. Our retrieval-augmented agent, answering a question about late-fee handling, had served up a clean two-paragraph summary. The summary was accurate. The contract it was summarising belonged to a different customer.
We pulled the agent offline at 11:17. We had it patched and back up by 18:40. The fix was a fourteen-line Postgres policy. The incident report we filed for the affected tenant was seven lines long, because there was nothing more to say.
This post walks the whole thing: what failed, what the report looked like, and the row-level policy that ended the class of bug.
The seven-line incident report
Incident: Cross-tenant content disclosure in RAG response
Detected: 2026-05-12 11:14 CEST via customer support ticket #4118
Scope: 1 affected tenant, 1 query, response contained 2 paragraphs sourced from another tenant's draft contract
Cause: Tenant isolation enforced only in application code; an off-path retrieval call accepted tenant_id=None and skipped the filter
Containment: Agent disabled at 11:17 CEST; affected response and source chunks quarantined
Fix: Row-level security on doc_chunks with FORCE, session-bound app.tenant_id GUC, rollout completed 18:40 CEST
Customer impact: 2 paragraphs of a non-executed draft, no PII, no financial data. Notification to both tenants sent 19:02 CEST.We deliberately kept it short. A long report is a way of hiding a small bug behind language. Seven lines forces honesty about what happened and what we changed.
What our RAG actually did
The stack is unremarkable. Customer documents land in object storage, get chunked, get embedded, get written to a Postgres table with pgvector. At query time, we embed the user's question, run a k-NN search, stuff the top-k into the prompt. Multi-tenant from day one. Every chunk row has a tenant_id. The retriever filtered on it. Like this:
SELECT chunk_id, content
FROM doc_chunks
WHERE tenant_id = $1
ORDER BY embedding <=> $2
LIMIT 8;That query is fine. The problem is that it is not the only query.
Somewhere in the history of the codebase, an internal endpoint for previewing how the retriever ranked a document had been written. It called the same Python class, but it passed tenant_id=None for "preview as superuser." When the retrieval class saw None, it removed the filter.
Then someone wired that preview endpoint into a different code path. Then the agent started using it for a specific kind of fallback that the original author had not anticipated. Then a customer asked a question whose top-8 nearest neighbours happened to include two chunks from another tenant's draft.
Nobody wrote a line of code that said "leak this." It leaked because the only thing standing between tenants was a Python parameter that defaulted to None.
Filtering in application code is not a boundary
This is the part worth sitting with. If your only tenant isolation is a WHERE clause that your application is responsible for adding, you have not isolated tenants. You have politely asked them not to talk to each other.
Anything that bypasses the application layer will skip the filter: an internal script, a debug endpoint, a Jupyter notebook one of your engineers runs against the prod replica, a future maintainer reading docs that say "set tenant_id=None to skip filter," a refactor that introduces a second code path you forgot existed.
This is exactly the pattern OWASP catalogued in its 2025 LLM Top 10 under sensitive information disclosure: the model is not the leak, the data path feeding it is. The mitigation that actually works is structural access control at the storage layer, not instructions inside a prompt.
It is also the same lesson that keeps showing up in writeups about how vendors contain language models across products. The interesting boundary is not what the model is told. It is what the model is allowed to see.
The row-level policy that ended it
Postgres has had row-level security since 9.5. With pgvector, RLS composes cleanly with vector search, because vector search is just a query against a table. The policy filters before the ORDER BY ever sees the row, so the planner does not even have to consider out-of-tenant chunks.
Here is the shape of the migration we shipped, simplified to the bits that matter:
-- 1. Make tenant_id non-null and indexed.
ALTER TABLE doc_chunks
ALTER COLUMN tenant_id SET NOT NULL;
CREATE INDEX IF NOT EXISTS doc_chunks_tenant_idx
ON doc_chunks (tenant_id);
-- 2. Turn RLS on, and force it for table owners too.
ALTER TABLE doc_chunks ENABLE ROW LEVEL SECURITY;
ALTER TABLE doc_chunks FORCE ROW LEVEL SECURITY;
-- 3. The policy. A session may only see rows whose
-- tenant_id matches the GUC set on connection.
CREATE POLICY tenant_isolation ON doc_chunks
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::uuid)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::uuid);The agent's connection pool now sets app.tenant_id at the start of every transaction, derived from the authenticated request, not from a function argument:
async def with_tenant(conn, tenant_id: UUID) -> None:
# `true` makes the setting local to the current transaction.
await conn.execute(
"SELECT set_config('app.tenant_id', $1, true)",
str(tenant_id),
)Now the same broken admin endpoint, if it ran today, would return zero rows. Not "fewer rows." Zero. There is no value of the Python function argument that lets the query see another tenant's data, because the Python function argument is no longer involved in the access decision.
RLS without FORCE is a suggestion. Table owners bypass policies by default, and most application roles connect as the table owner. Add FORCE ROW LEVEL SECURITY, then audit which roles still hold BYPASSRLS. We grant it to exactly zero application roles. Migrations run under a separate role with its own audit trail.
The test that would have caught us
We did not have this test before. We have it now. Two tenants, two chunks each, run the retriever as tenant A, assert no tenant B content comes back, then do the reverse.
@pytest.mark.asyncio
async def test_retriever_cannot_cross_tenants(pool):
a, b = uuid4(), uuid4()
await seed_chunk(a, "Late fees: 2% per month after day 30.")
await seed_chunk(b, "Late fees: 5% per month after day 14.")
async with pool.acquire() as conn:
await with_tenant(conn, a)
hits = await retrieve(conn, "what is the late fee", k=10)
contents = [h.content for h in hits]
assert any("2%" in c for c in contents)
assert not any("5%" in c for c in contents)The point of this test is not to prove the policy works once. It is to fail loudly the day someone refactors the connection pool and forgets to call with_tenant. Without RLS, that refactor leaks silently. With RLS, the test goes red because the agent suddenly retrieves nothing at all. "Retrieves nothing" is a much cheaper failure mode than "retrieves the wrong tenant."
What we kept, and what we ripped out
We kept the application-level WHERE clause. Defence in depth costs nothing here, and it lets the query planner skip past out-of-tenant rows even when statistics drift. The RLS policy is the boundary. The WHERE clause is a hint.
We ripped out the "tenant_id=None means superuser" pattern across the codebase. There is no superuser of tenant data inside the application. If a human at our company needs to inspect a tenant's chunks for debugging, they connect with their own database role, which has a policy explicitly listing the tenants they support, and every query they run is logged with their identity.
We also moved evaluation fixtures to a separate schema. The eval set used to live in the same table with tenant_id='eval'. That is the kind of cute idea that grows teeth. Now eval data lives in eval_chunks, with its own retriever, and it cannot accidentally be reached by tenant-scoped code.
The audit log we wish we had on day one
Since the fix, every retrieval writes one log line with the requesting tenant_id, a hash of the query embedding, the IDs of the rows returned, and the tenant_id of each row. With RLS in place, that last column is constant, by construction, equal to the requester. So the log becomes a tripwire rather than a debug aid: any line where the two tenant_ids disagree is, by definition, a bug in our policy. We alert on it. So far, zero hits.
The other thing we changed: incident reports for tenants stay short. The temptation when something goes wrong is to write three pages explaining how serious you are about security. The tenant does not want three pages. They want to know what happened, what you changed, and whether it can happen again. Seven lines is enough room to say all three.
The pattern, generalised
If you ship a RAG agent for any multi-tenant workload, ask yourself this: if I removed every WHERE clause from my application code today, would another tenant's data become reachable? If the answer is yes, the database is not enforcing isolation. The application is, until it isn't.
This generalises beyond RAG. The same logic applies to background jobs that touch customer data, to admin tools, to LLM evaluation harnesses, to anything that reads from a shared table. The cost of RLS at the start of a project is roughly one afternoon. The cost of retrofitting it after a leak is the leak itself, plus the trust you spend writing the notification email.
If removing your application's WHERE clause exposes another tenant's data, the database is not enforcing isolation. RLS with FORCE is the only fix that survives a refactor.
If you are running a RAG agent on shared infrastructure
When we built the document-retrieval agent for a Dutch property firm earlier this year, the thing we ran into was exactly this: the temptation to put tenant logic in the agent's tool layer, where it was easy to reason about, instead of in the database, where it was hard to bypass. We ended up solving it the way described above, and we now ship every new AI agent with RLS on by default.
The five-minute audit you can run today: open one production query that touches tenant data, copy it into a SQL client, run it with the tenant filter removed against a non-prod replica, and see what comes back. If you get rows, you have a bug. The size of the bug is the size of that result set.
Key takeaway
If removing your application's WHERE clause exposes another tenant's data, the database is not enforcing isolation. RLS with FORCE is the only fix that survives a refactor.
FAQ
Does row-level security slow down pgvector queries?
Not meaningfully in our workload. RLS adds a predicate the planner already had to evaluate via our WHERE clause. The tenant_id index makes both equally cheap. Measure on your own data before assuming overhead.
Why not give each tenant a separate schema or database?
It works, but it pushes the boundary into ops: migrations, backups and connection pooling all multiply by tenant count. RLS keeps one schema and one pool while still enforcing isolation at the row level.
What if your vectors live in a managed vector DB, not Postgres?
Check whether the vendor supports per-namespace API keys or scoped tokens, and treat metadata filters as application logic, not a boundary. If isolation lives only in a filter you pass at query time, you have the same bug we did.
Could the language model still leak data already pulled into context?
Yes. RLS stops cross-tenant retrieval, not in-context confusion. Keep one tenant's data per prompt, and never mix tenants in a single agent session even for evaluation or batching.
How did you tell the affected tenant?
Same day, by email, with the seven-line report attached and a named contact for follow-up questions. We also told the tenant whose data leaked, on the same day, with the same report.