← Blog

RAG

Pinecone, pgvector, Qdrant: three months in a Dutch RAG

On a Tuesday in March a Dutch city archivist asked our pilot RAG where a 2014 council decision lived. Three months later we know which vector store stayed, which one we wound down, and why.

Jacob Molkenboer· Founder · A Brand New Company· 5 Jun 2026· 9 min
Half-open oak index card drawer with cream cards, brass tab, green ribbon, leather logbook on ivory desk.

On a Tuesday in March a city archivist in a mid-sized Dutch municipality opened our pilot search box and typed: "wat besloot de raad in 2014 over het busstation?" The previous system, a SharePoint index sitting on top of fifteen years of council minutes, returned zero hits. Our pilot returned the right minute, the right page, and the relevant decision in 1.8 seconds. The archivist sent a one-line reply: "eindelijk."

That pilot is now three months old. We promised the client we would write down what we learned about the three vector stores we ran in parallel: Pinecone, pgvector, and Qdrant. Not a benchmark with synthetic data, not a Reddit-thread shootout. The result of running the same 400,000-document Dutch corpus on three systems through one quarter of real questions from real citizens.

The brief

The corpus is fifteen years of municipal output. Council minutes, motion texts, departmental memoranda, building permit decisions, and a long tail of scanned PDFs from before the digital workflow took hold. Around 400,000 documents after deduplication, with a heavy OCR pass on the older scans. Dutch language throughout, with the usual municipal vocabulary (raadsbesluit, ruimtelijke ordening, motie, amendement, omgevingsvergunning). Around 11 million chunks after our windowing pass.

Three non-negotiables came from the procurement team, not from us:

  • Data must stay inside the EU. Frankfurt or Amsterdam, no exceptions.
  • Backups must fit the existing BIO (Baseline Informatiebeveiliging Overheid) checklist that the municipal DBA already runs against every database in the building.
  • One DPA, not three. The fewer external processors on the AVG register, the better.

That last point is the one most engineering write-ups skip. For a Dutch municipality, every extra sub-processor is a meeting with the FG (Functionaris Gegevensbescherming) and a fresh row on a register that already has thirty entries. The technical choice and the contractual choice are not separable. If you forget that, your benchmark winner gets vetoed by someone who has never read a vector database whitepaper, and they are right to do it.

Why we ran all three for a quarter

A benchmark without a deadline is a hobby. A benchmark on synthetic data tells you nothing about how a Friday afternoon's reindex collides with a Monday morning's read traffic. So we built the same retrieval pipeline three times: one writing to Pinecone serverless in eu-west, one writing to pgvector on the same Postgres that held our document metadata, and one writing to a self-hosted Qdrant cluster on the municipality's existing Kubernetes.

All three indexed the same 11M chunks from the same 1024-dim multilingual embeddings. Same chunker, same rerank step, same prompt template. The only variable was the vector store. We mirrored production traffic to all three for the full quarter and only routed read traffic to one of them at a time, rotating weekly so each store felt real load.

Pinecone, the managed bet

Pinecone was the fastest to stand up. A handful of API calls and we were inserting vectors. The eu-west region kept the data inside the EU at rest and in transit. Median latency was the lowest of the three: around 35 ms server-side for a top-10 ANN search with metadata filters applied.

Filter performance was the pleasant surprise. We filter heavily on afdeling, jaar, and documenttype, because a citizen rarely cares about the whole archive. They care about the housing department in 2019. Pinecone's metadata filtering handled these without the recall drop we sometimes see when filters are pushed onto an ANN index post-hoc.

What killed it for this client was not the engineering. It was the procurement officer reading "Pinecone Systems Inc., Palo Alto" on the DPA. The technical eu-west data path was fine. The contractual surface was an extra US-headquartered sub-processor on an AVG register that already had its limit of patience. We ran the full quarter so we could prove we had not priced out a great option, but the verdict was decided in week one in a room we were not in.

pgvector, Postgres doing the heavy lifting

pgvector is now the first thing we reach for on most projects in this size range. It is a Postgres extension. If your Postgres is in eu-central, your vectors are in eu-central. Your backups are already a solved problem because they are Postgres backups, and your DBA already knows how to restore them at 03:00.

For this corpus we used HNSW (added to pgvector in 0.5.0), 1024-dim vectors, cosine distance:

-- 400k documents, ~11M chunks, 1024-dim multilingual embeddings
CREATE INDEX documents_embedding_idx
ON document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

-- query-time recall/latency knob
SET hnsw.ef_search = 80;

Index build on a db.r6g.large took about 47 minutes for the full 11M chunks. Median search latency at ef_search = 80 settled around 55 ms server-side. Slower than Pinecone, well inside our 1.8 second end-to-end budget. The rerank step dominates anyway, and no vector database wins that race.

The advantage we did not predict was filter pushdown. Because the chunk metadata lives in the same table as the embedding, a query like "council decisions from the ruimte department in 2014" becomes a plain WHERE clause the planner combines with the HNSW index. We did not have to design a dual-store query layer or keep two systems in sync. We wrote SQL. The municipal DBA read it and nodded. That is worth a lot.

Warning

HNSW indexes do not like bulk reindexes during business hours. When we backfilled 30,000 newly scanned permits in month two, vacuum churn on the chunks table caused lock contention with the read path and search latency spiked to 600 ms for fifteen minutes. pg_repack and a 03:00 cron solved it permanently, but you only learn this in production.

Qdrant, the middle path

Qdrant was the most interesting technical option. Written in Rust, low memory footprint, built-in sparse-plus-dense hybrid search, payload filtering that is genuinely expressive. The self-hosted story is clean: a Helm chart, a few volumes, and you are running.

A filter on Qdrant looks like this from the client side:

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue

client = QdrantClient(host="qdrant.internal", port=6333)

hits = client.search(
    collection_name="raadsstukken",
    query_vector=embedding,
    query_filter=Filter(
        must=[
            FieldCondition(key="afdeling", match=MatchValue(value="ruimte")),
            FieldCondition(key="jaar", match=MatchValue(value=2014)),
        ]
    ),
    limit=10,
)

Latency landed between Pinecone and pgvector, around 45 ms server-side. The sparse-plus-dense hybrid was a real advantage on Dutch lexical queries where a name, a dossier number, or a street had to match exactly. Pinecone can do this with sparse vectors, but it is more setup. pgvector requires a separate full-text index and a fusion step in your application code.

The operational cost was real. Qdrant is one more system on the on-call rotation. One more Helm chart, one more set of dashboards, one more thing that can OOM during collection optimization (it did, once, during peak hours; we moved optimization to 03:00 and it has been quiet since). For a team that already runs Kubernetes and has the appetite, it is a fine choice. For a municipal IT department that already runs Postgres for thirty systems and Qdrant for none, it is one new neighbour to learn.

The numbers we actually measured

None of these are universal. They are specific to this client, this corpus, and this hardware. We share them because the shape of the comparison matters more than the digits.

Recall@10 against a 1,200-question evaluation set written by the archivist (the questions are not synthetic; they are the actual questions citizens have asked the help desk over the last two years):

  • Pinecone serverless, default settings: 0.93
  • pgvector HNSW, ef_search=80: 0.94
  • Qdrant, default settings: 0.94

The recall differences are inside the noise of the evaluation set. The end-user difference is invisible. What is not invisible is the operational and contractual cost. Pinecone's quarterly invoice was the highest of the three by a comfortable margin, because serverless storage and read units add up when you are filtering across 11M chunks. pgvector cost us the marginal CPU and storage on a Postgres instance we were already paying for. Qdrant cost us a small Kubernetes namespace and one more rotation on the on-call calendar.

What we kept, and why

We kept pgvector. Not because the benchmark was decisive. It was not. We kept it because:

  • The vectors live next to their metadata, in one database, with one backup, on one DPA.
  • The municipal DBA already runs Postgres for thirty other systems. The on-call surface did not grow by a single page in PagerDuty.
  • When the question is "what happens if this dies on a Saturday at 23:00", the answer is the same answer for every other Postgres instance in the building.

We kept Qdrant on the bench for the sparse-hybrid experiments we did not commit to production. When the archivist asks for retrieval by dossier number that must match exactly, that is the path we will reach for next quarter. We are honest with ourselves: if the sparse-hybrid story matures into a daily requirement, Qdrant will earn its way back in.

Pinecone we wound down. The technology was the most polished of the three. The contractual surface was the wrong shape for this customer. We would happily run it again for a private-sector client whose procurement team does not flinch at a US sub-processor.

What we would tell ourselves three months ago

Three things. First: pick the boring option that already lives in your stack, unless you have a technical reason that overrides procurement. For most municipal and SME work in the Netherlands, that means pgvector. Second: write the eval set before you pick the database. Twelve hundred real questions from the help-desk transcripts taught us more in one afternoon than a week of synthetic benchmarks would have. Third: the reindex story matters more than the search story. Search happens millions of times and is dull. Reindex happens once a quarter and is the thing that wakes you up at night.

When we built the municipal knowledge agent for this client, the question was never "which vector database is best" but "which one our ops team can carry at 23:00 on a Saturday." That answered itself once we drew the operations diagram on a whiteboard and put a name next to every box.

The five-minute audit you can run on Monday: open your vector store dashboard, find your largest collection, and ask your DBA what happens if it dies tonight. The honest answer tells you whether you picked the right one.

Key takeaway

The best vector store is the one your ops team can carry at 23:00 on a Saturday. For our Dutch municipal RAG that meant pgvector, with Qdrant on the bench.

FAQ

Is pgvector fast enough for 400,000 documents?

Yes. With HNSW and ef_search around 80 we measured median search latency near 55 ms on a db.r6g.large for 1024-dim embeddings. The rerank step is usually the real bottleneck, not the vector lookup.

Why not just pick Pinecone and move on?

For many teams it would be fine. Our client needed all data and metadata under one DPA inside the EU, and procurement preferred fewer external sub-processors. Pinecone passed on tech, lost on contractual surface.

Where does Qdrant fit best?

When you want sparse-plus-dense hybrid search out of the box and you already run Kubernetes. We kept Qdrant on the bench for exact-match dossier retrieval we have not yet committed to production.

How do you reindex without downtime?

We write to two collections during a switchover window, validate with a 200-query smoke test, then flip the read path with a config change. On pgvector we use CREATE INDEX CONCURRENTLY and pg_repack to avoid lock contention.

ragknowledge basearchitecturecase studyoperationsintegrations

Building something?

Start a project