AI agents

Vector DBs for patent search: Pinecone, Weaviate, pgvector

Sunday 22:14 in Eindhoven. The re-embedding of 1.4M patent claims stalled at 38%, the runbook author is in Vietnam, and Monday's prior-art queue starts in nine hours.

Jacob Molkenboer· Founder · A Brand New Company· 26 Apr 2026· 7 min

Open brass card-index drawer with cream cards, one chartreuse tab, wooden stamp, iron tag on ivory desk.

Sunday, 22:14, Eindhoven. The on-call engineer at a 21-person octrooi-bureau gets a Slack ping from the embedding pipeline: the weekly re-embedding job on 1.4M patent claims has stalled at 38%. The runbook author is on holiday in Vietnam. Monday's first prior-art queue starts in nine hours.

This is the moment that decides which vector database you actually chose. Not the demo on Tuesday. The 22:14 ticket.

We spent eight weeks last spring helping that bureau pick its embedding layer. Three candidates made it to the bake-off: Pinecone serverless, Weaviate on Hetzner, and a hand-rolled pgvector setup on a Postgres 16 cluster the bureau already operated. The scoring sheet had three columns: per-octrooi cost at full load, replay-defensible vector history for the Dutch Rijksoctrooiwet 20-year bewaarplicht, and who actually patches the index at 22:14 on a Sunday.

The workload, in real numbers

The bureau runs prior-art searches across EPO, USPTO, and WIPO via the agent we built on top of their corpus. Twenty-one attorneys and paralegals; five to fifteen retrievals each per working day; roughly 2,800 retrievals a week. Each retrieval pulls a top-k of 200 from the vector index, hands the chunks to a reranker, then to a citation step that resolves claim numbers back to the source PDFs.

The corpus is 1.4M claim embeddings, growing by about 14,000 a week as new applications publish. Embeddings are 1,536-dimensional. The team re-evaluates the embedding model every quarter; when they swap it, the whole corpus is re-embedded over a weekend. That is the job that stalled.

Three numbers carry the whole decision: storage cost per million vectors, query cost per thousand retrievals, and the cost of one unplanned engineer-hour at 22:14.

Pinecone serverless: fast in, slow to audit

Pinecone's serverless tier is the fastest path to a working index. We had a 1.4M-vector index serving p95 under 90ms in an afternoon. At this load, billing came out to roughly €480/month all-in. Cheap, by any measure that ignores the next two columns.

The audit column is where Pinecone got harder. The Rijksoctrooiwet wants a bureau to keep its file for 20 years after the application closes. For a prior-art search that means: in 2046 you should be able to prove which embedding of which claim text returned which hit on which date. Pinecone's index is mutable. You can snapshot a serverless index, but snapshots are not a first-class versioning surface. If you want to replay a 2026 retrieval in 2034 with the exact 2026 index state, you are stitching together collection backups, metadata blobs, and a frozen copy of the embedding model binary. Doable. Not trivially auditable.

The 22:14 column is the real one. When the re-embedding job stalls, the on-call engineer cannot SSH into Pinecone. They open a support ticket. Response time at 22:14 on a Sunday in Europe is measured in hours, not minutes.

Weaviate on Hetzner: flexible, second product

Weaviate is the middle option in almost every dimension. Two AX102 nodes at Hetzner (Ryzen 9 7950X3D, 128GB RAM, NVMe) ran about €220/month total. With HNSW and a warm cache, p95 sat in the 60ms range. The schema model lets you carry multiple embedding versions side by side, which is genuinely useful for the quarterly model swap.

Per-retrieval cost was the best of the three at full load. The audit story was workable: Weaviate's backup-restore is a first-class operation, and you can carry per-vector metadata for embedding-model version and retrieval timestamps.

The problem with Weaviate, for this bureau, was the operational footprint. They already ran a Postgres cluster for case files, billing, and the Dutch-language document store. Adopting Weaviate meant the on-call engineer had to learn a second stateful system, its backup story, its upgrade cadence, and its failure modes. The bureau has one and a half engineers. At 22:14, the engineer needs to grep a familiar log, not skim documentation.

pgvector on Postgres 16: boring, owned, replay-defensible

The third option was the least exciting and the one we shipped.

pgvector 0.7 on Postgres 16 gives you HNSW indexes inside the database the bureau already operates. The table looks like this:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE claim_embedding (
  claim_id        bigint        NOT NULL,
  source          text          NOT NULL,   -- 'EPO' | 'USPTO' | 'WIPO'
  embedded_at     timestamptz   NOT NULL DEFAULT now(),
  model_version   text          NOT NULL,   -- e.g. 'mxbai-embed-large-v1'
  embedding       vector(1536)  NOT NULL,
  PRIMARY KEY (claim_id, model_version)
);

CREATE INDEX claim_embedding_hnsw
  ON claim_embedding USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

The retrieval path is a single SQL query the bureau's existing application stack already knows how to log, trace, and roll back:

SELECT claim_id, source, model_version,
       1 - (embedding <=> $1) AS score
FROM   claim_embedding
WHERE  model_version = $2
ORDER  BY embedding <=> $1
LIMIT  200;

p95 on the 1.4M-row index, with ef_search bumped to 80, came in at 110ms on the bureau's existing 8-core production node. Slower than Pinecone in absolute terms, fast enough that the reranker is the bottleneck.

The interesting bit is the columns the bureau did not have to add. embedded_at and model_version are already in the row. Point-in-time recovery on Postgres 16 covers the vector column as well as every other table, which means the bewaarplicht story for the embedding layer is the same story the bureau already wrote for case files: WAL archive to immutable object storage, retention policy 20 years, restore drill once a quarter.

Warning

If you are running a managed vector DB for a Dutch octrooi-bureau and your "20-year audit" plan is "we will export JSON snapshots into S3 once a quarter," read the Rijksoctrooiwet again. You need a defensible chain from claim text to embedding to retrieval, not just a folder of files.

The 20-year replay test

The Rijksoctrooiwet's bewaarplicht for octrooi files is twenty years from closure. For a prior-art search agent that means a third party can ask, in 2044, why your attorney cited US-1234567 as prior art for an application filed in 2026 — and you need to show them the retrieval.

We made the replay test the deciding interview question. For each candidate: reconstruct the exact top-50 result for a 2026 query, using only artifacts you would still have access to in 2044.

For pgvector, the answer was a one-liner. The query, the model version, and the row state at retrieval time are all in Postgres; WAL plus a snapshot of the embedding model binary in immutable storage replays the search. For Weaviate, the answer was three pages of runbook. For Pinecone, the answer involved a vendor that may or may not still operate the same SKU in 2044.

Per-octrooi cost at 2,800 runs a week

Normalising for the bureau's workload — 2,800 weekly retrievals, 1.4M vectors, quarterly re-embeds — the all-in cost per retrieval looked roughly like this:

Pinecone serverless: ~€0.04 per retrieval, plus an unbookable operational risk at the 22:14 column.
Weaviate on Hetzner: ~€0.02 per retrieval, plus a quarter-FTE of platform engineering the bureau did not have.
pgvector on Postgres 16: ~€0.004 marginal per retrieval, because the database was already paid for. Re-embedding compute is the only material new line.

Per-retrieval cost is the headline number every vendor pitch leads with. It is also the smallest of the three columns. Pick the layer that survives Sunday night.

What we shipped

We deployed pgvector on the bureau's existing Postgres 16 primary, with a streaming replica for read traffic, partitioned claim_embedding by source, and wrote the re-embedding job to checkpoint every 50,000 rows into a reembed_progress table. The Sunday 22:14 stall is now a SELECT * FROM reembed_progress ORDER BY started_at DESC LIMIT 1 away from "resume from row 532,001". The on-call engineer never needs to learn a new system at the worst moment of the week.

When we built the prior-art AI agent for that Eindhoven bureau, the thing we kept coming back to was that the vector database is the layer the model swap and the bewaarplicht both pass through. We solved it by treating the embedding as just another column in a table the team already owned, and writing the audit story once instead of twice.

Five-minute audit for your own setup: open the table that holds your embeddings, and check whether it has an embedded_at and a model_version column. If it doesn't, you don't have a replay story yet.

Key takeaway

For a 20-year audit trail under the Rijksoctrooiwet, the vector database your team already operates beats the one that runs 50ms faster.

FAQ

Does pgvector actually scale past a couple of million vectors?

Yes. HNSW in pgvector 0.7+ handles tens of millions of vectors on a single well-tuned node. The bureau's plan room is single-digit millions, which is squarely in the comfort zone.

Why didn't you bake-test Qdrant or Milvus?

Both are good products. The deciding column was operational familiarity for a team of 1.5 engineers, and the bureau already ran Postgres. A second stateful system was the cost we refused to pay.

What happens to old embedding versions when you swap the model?

We keep them. The (claim_id, model_version) primary key means every historical retrieval can be replayed against the exact embedding it returned. That is the 20-year audit trail.

Is Pinecone the wrong choice for every regulated workload?

No. It is the wrong choice when the regulator wants a replay trail you can defend without vendor cooperation. For shorter retention or non-audited workloads, the speed-to-ship is genuinely useful.

How do you restart a stalled re-embedding job without re-doing the finished rows?

Checkpoint progress to a small table every N rows, key the job on (model_version, batch_id), and have the worker skip rows where that pair already exists. Resume becomes a SELECT, not a migration.

ai agentsragknowledge basearchitecturecase studyoperations

Building something?

Start a project