← Blog

RAG

Supabase RAG audit: the checklist we run before quoting

It's a Tuesday at 14:00 in Utrecht. A CTO hands us read-only Supabase credentials and asks for a number. Before we quote a RAG retrofit, we audit.

Jacob Molkenboer· Founder · A Brand New Company· 20 Jun 2026· 8 min
Open oak index-card drawer with cream cards, brass divider, green tab, ledger bundle on ivory paper.

It's a Tuesday at 14:00 in Utrecht. The CTO of a 40-person logistics SaaS hands us read-only credentials to their Supabase project and asks for a number. They want a RAG agent over their 11,000-document support archive, scoped per tenant, live by Q3. Before we quote, we run an audit. Not a demo. Not a "let's spin up an n8n flow." An audit, because a RAG retrofit on a cluster you didn't build is almost always more dangerous than a greenfield one: the bugs are already there, just dormant, waiting for an auth.uid() you forgot to add.

This is the checklist we use. It takes a senior engineer about four hours and it has saved us, conservatively, three rebuilds.

Why we audit before quoting

A RAG-agent retrofit sits on top of three things the client usually thinks already work: row-level security, embedding lineage, and tenant isolation in backups. None of them tend to work the way the team remembers.

When we quote without auditing, we either price in a phantom rebuild and lose the deal, or we price the happy path and eat the rebuild on our margin. Neither is a business. So the audit is the deal. We charge a fixed fee for a written report, and if the client commissions the build we credit it back.

The audit has three movements. RLS drift. Embedding versioning. Backup integrity.

The RLS drift sweep on the top 25 tables

The first movement is mechanical. We list the 25 largest public-schema tables by row count, then for each one we check four things:

  1. Is RLS enabled?
  2. Does at least one policy reference auth.uid() or a tenant_id lookup?
  3. Does a service_role policy exist that bypasses the tenant filter?
  4. Has any policy been replaced more recently than the table's own DDL?

That last one is the one that catches people. A team enables RLS in February, ships a feature in April, and somewhere in the middle a junior renames a policy from tenant_isolation to read_own_rows and accidentally drops the WITH CHECK half. The select side keeps working. The insert side gets a hole.

We run this with a single query. Here's the one we hand to clients at the end of the engagement, runnable as-is:

-- top 25 public tables and their RLS posture
WITH sized AS (
  SELECT c.oid, n.nspname, c.relname,
         pg_total_relation_size(c.oid) AS bytes
  FROM   pg_class c
  JOIN   pg_namespace n ON n.oid = c.relnamespace
  WHERE  c.relkind = 'r' AND n.nspname = 'public'
  ORDER  BY bytes DESC
  LIMIT  25
)
SELECT s.relname,
       c.relrowsecurity            AS rls_on,
       c.relforcerowsecurity       AS rls_forced,
       COUNT(p.polname) FILTER (
         WHERE p.polqual::text ILIKE '%tenant_id%'
            OR p.polqual::text ILIKE '%auth.uid%'
       ) AS tenant_policies,
       COUNT(p.polname) FILTER (
         WHERE p.polroles::text ILIKE '%service_role%'
       ) AS svc_policies
FROM   sized s
JOIN   pg_class c   ON c.oid = s.oid
LEFT   JOIN pg_policy p ON p.polrelid = c.oid
GROUP  BY s.relname, c.relrowsecurity, c.relforcerowsecurity
ORDER  BY rls_on, tenant_policies;

Then we score. RLS off on any table holding user content: red. RLS on but zero tenant-aware policy: red. RLS on, tenant policy present, but a service_role policy that selects without a WHERE: yellow, because a RAG agent that runs as service_role will silently cross tenants the moment somebody forgets to thread auth.uid() through a SECURITY DEFINER function.

We've seen the yellow-turns-red pattern on roughly six of the last ten audits. The Supabase docs are clear that policies are permissive by default and any matching policy grants access, but in practice the team reads "RLS is on" as "we are safe" and stops there.

Warning

If your RAG agent connects as service_role "for convenience," every retrieved chunk has to carry its own tenant_id through the filter at the SQL level, not the prompt level. A model will happily ignore a system instruction. Postgres will not ignore a WHERE clause.

Embedding model versioning against the AVG re-index window

The second movement is the one Dutch teams usually haven't thought about. The AVG (the Dutch name for the GDPR, supplemented by the UAVG implementation act) gives a data subject the right to erasure. For a RAG system, that means a single delete has to propagate to the source row, every embedding row that points at it, every cached chunk, and any vector that ended up in a backup older than your retention window.

If your embedding model is versioned (and after the text-embedding-3-large shift in 2024 and the open-weights churn of 2025, it should be) you also have to handle re-index. Models drift. Dimensions change. We've now had to re-index three production clusters because the team upgraded from text-embedding-3-small (1536d) to a 3072d model and forgot that the pgvector index was hard-coded to the old dimension.

The audit checks two things here:

  • Is there an embeddings_model column on the vector table, or a sibling lineage table mapping chunk_idmodel_namemodel_versionembedded_at?
  • What is the worst-case re-index window for the full corpus, and does it fit inside the AVG's 30-day erasure clock with margin?

A quick way to get the worst-case window: embed a 1,000-chunk sample against the new model, measure tokens per second under your real rate limit, multiply, and add 25%. If the result is over 21 days for the full corpus, you do not have margin for a single failed run.

-- lineage check: are chunks tagged with the model that embedded them?
SELECT model_name, model_version,
       COUNT(*)         AS chunks,
       MIN(embedded_at) AS first_seen,
       MAX(embedded_at) AS last_seen,
       pg_size_pretty(SUM(pg_column_size(embedding))) AS vector_bytes
FROM   rag.chunk_embeddings
GROUP  BY model_name, model_version
ORDER  BY last_seen DESC;

If that query throws "column does not exist," you have your first finding. We write it up plainly: "no embedding lineage; a model upgrade requires a full re-embed with no fallback, estimated 14 days at current rate limits; an AVG erasure request during that window cannot be satisfied within the statutory term." It tends to focus the conversation.

For the actual pgvector setup we still point teams at the pgvector README on index types and dimensions, because the HNSW vs IVFFlat trade-off is not the thing the audit decides. The lineage column is.

The pg_dump survival test at 02:00

The third movement is the one that gets the report taken seriously. We take the client's nightly pg_dump (almost always running between 02:00 and 03:00 Europe/Amsterdam), pull last night's dump into a sandbox cluster, restore it, and run one query:

SELECT COUNT(DISTINCT tenant_id)              AS tenants_in_dump,
       COUNT(*) FILTER (WHERE tenant_id IS NULL) AS orphans,
       MIN(created_at)                        AS earliest,
       MAX(created_at)                        AS latest
FROM   rag.documents;

Then the matching query against the live cluster. If the numbers diverge by more than rounding, the dump is not a real backup of the RAG corpus. Usually because someone added a partitioned table after the original pg_dump flags were set, and --exclude-schema is now silently dropping the partition that holds 80% of the new tenant's documents. The pg_dump docs are explicit about how --schema and partitioned tables interact, but the script in cron.daily was written in 2022 and nobody has revisited it.

We then test the inverse: take the dump, restore it, drop one tenant's documents with a single DELETE WHERE tenant_id = $1, and confirm no other tenant's row count moves. If anything else moves, there is a foreign key cascade you didn't mean to grant, or a materialized view caching cross-tenant data.

The three knowledge bases that survived

Across the twelve audits we ran in 2025, three RAG knowledge bases passed the pg_dump-and-delete test cleanly on the first attempt. What did they have in common?

One. A single tenant_id uuid not null on every table in the RAG schema, including the join tables. No nullable tenant columns "for the shared docs." Shared docs lived in their own schema with their own retrieval path.

Two. Every embedding row had a foreign key to the document, and the document had a foreign key to the tenant, with ON DELETE CASCADE written out explicitly. No triggers doing the cascade in application code. The database did it, atomically, in the same transaction as the user's delete request.

Three. The nightly dump used --schema=rag --schema=public, plus a separate per-tenant logical export that ran every Sunday. The per-tenant export was the actual disaster recovery story; the cluster-wide dump was just for fast restore. When the team needed to migrate one tenant off, they had a file. When they needed to restore the whole cluster, they had a different file. Different jobs, different files.

The other nine clusters needed work before a RAG retrofit was safe. None of that work was dramatic. A column added here, a policy rewritten there, a cron job updated. But it was work that had to be priced, and the audit is what let us price it honestly.

What to do tomorrow morning

Run the top-25 RLS query above against your own cluster. Read the output. If any row holding user content shows rls_on = false, or tenant_policies = 0, you have a finding. Write it down. That is the first hour of the audit, and you can do it before standup.

When we ran this audit for a Rotterdam-based HR-tech client last quarter, the thing that broke was the third movement: their nightly dump excluded the partition that held the previous month's onboarding documents, so a tenant restore would have lost three weeks of data. We rewrote the dump script, added the lineage column, and built the RAG agent on top of a cluster we now trusted. The audit took half a day; the rebuild it prevented would have taken six weeks.

Key takeaway

A RAG retrofit is not a model problem. It is a tenancy problem with a vector index attached. Audit the tenancy before you quote the build.

FAQ

How long does the RAG pre-quote audit actually take?

About four hours for a senior engineer on a cluster with fewer than 200 tables. Most of that is reading existing policies and running the dump-restore-delete cycle in a sandbox.

Why score the top 25 tables and not all of them?

Row-count weighted coverage. The top 25 by size almost always contain the tenant-scoped data; smaller tables tend to be lookup or config. We still scan the rest, but at lower depth.

Does the AVG actually require re-indexing embeddings?

It requires erasure to propagate to all copies of personal data within a statutory term. If a vector is derived from personal data, deleting the source means deleting the vector. Re-index timing is how you prove you can.

Can we run the audit ourselves before talking to ABN?

Yes. The three SQL queries in this post are the ones we run first. If they come back clean, you probably do not need an external audit before a RAG retrofit.

ragknowledge baseai agentssecurityarchitectureoperations

Building something?

Start a project