Databases

Postgres 19 query hints: 19 patterns we use for RAG

Sunday at 03:48. A nine-million-row pgvector table flips its plan and the retrieval p99 jumps from 180 ms to 4.2 seconds. Here is the cheatsheet we now ship.

Jacob Molkenboer· Founder · A Brand New Company· 17 Dec 2024· 7 min

Open oak index drawer with nineteen ledger tabs, one flagged green, beside folded paper, brass divider, red wax seal.

Sunday morning, 03:48 Amsterdam. A claims agent for a Dutch property insurer asks the retrieval layer for the three most relevant clauses in a 2014 fire-damage policy. The vector table holds nine million embedded paragraphs across fourteen years of contracts, addenda, and inspection notes. The Postgres planner flips the plan. The p99 jumps from 180 ms to 4.2 seconds. The customer has already opened a second support tab by the time the agent's first reply lands.

We have been stabilising that table for eight months. What we learned, sometimes painfully, is that the Postgres planner has opinions about HNSW recall versus sequential scans that do not always match ours, and those opinions shift between minor versions. The cheatsheet below is what we now ship inside every retrieval module. Nineteen patterns, grouped by how much fight the planner used to put up.

The pgsql-hackers thread around Postgres 19 query hints (front page of Hacker News this week) confirms what we already build for: the upstream community is finally pulling into core what the pg_hint_plan extension has done for fifteen years. That is good news. It is also the reason to write down what we have learned before the syntax fragments.

Numbers below come from the production system: pgvector 0.8, Postgres 17.4 on the live cluster, a Postgres 19 beta on staging, an HNSW index on a 1536-dimensional embedding column, four read replicas, a pool sized for ninety concurrent retrieval calls per second.

Six hints the planner used to drop on the floor

These six are pg_hint_plan style comment hints. In stock Postgres, the planner skips them. We load pg_hint_plan as a shared library, set pg_hint_plan.enable_hint = on, and the planner finally listens. In Postgres 19 the equivalent syntax moves into core, so the extension comes off our replica deploy checklist.

1. IndexScan: force the HNSW index

The single highest-impact hint. The planner sometimes decides a parallel sequential scan over nine million rows is cheaper than an HNSW probe, especially after VACUUM ANALYZE rewrites the row estimate. It is wrong about 4% of the time, but 4% of ninety calls per second is enough to wreck the tail.

/*+ IndexScan(c claims_embedding_hnsw) */
SELECT c.id, c.body, c.embedding <=> $1 AS distance
FROM claims c
WHERE c.policy_id = ANY($2)
ORDER BY c.embedding <=> $1
LIMIT 3;

2. NoSeqScan: block the fallback

Belt and braces with the IndexScan hint. If the index is briefly invalid during a REINDEX CONCURRENTLY, NoSeqScan makes the query fail loudly instead of silently degrading to a full table scan that finishes in twelve seconds and breaks the agent's response budget.

/*+ IndexScan(c claims_embedding_hnsw) NoSeqScan(c) */
SELECT ...

3. HashJoin: kill the nested loop on policy lookups

When the agent retrieves clauses for a customer with hundreds of active policies, the planner picks a nested loop with a memoize node. Memoize is brilliant when the inner side fits in work_mem. With our cardinality it spills, and a spilling memoize is worse than the hash it replaced.

/*+ HashJoin(c p) Leading((p c)) */
SELECT c.id, c.body, c.embedding <=> $2 AS distance
FROM policies p
JOIN claims c ON c.policy_id = p.id
WHERE p.customer_id = $1
ORDER BY c.embedding <=> $2
LIMIT 3;

4. Leading: pin the driving table

Always lead with the smaller relation. The planner usually figures this out, but after a partition swap on the policies table the statistics lag for hours and the join order goes backwards.

5. Memoize: when you actually want it

The mirror image of pattern 3. On low-cardinality fan-outs (one customer, a dozen policies, three clauses each) a pinned memoize keeps p50 under 40 ms.

/*+ Memoize(c) NestLoop(p c) */
SELECT ...

6. Parallel(hard): reserve the worker count

Without a hard pin, the planner negotiates with max_parallel_workers_per_gather at plan time. On a busy replica it picks zero workers and serialises. With Parallel(c 4 hard) it reserves four. The replica is fine; we sized the pool for it.

Warning

Hints are not policies. Every pattern above can starve another query under load. Stress-test the full workload with the hints in place before you ship them to a replica that also serves analytics or background jobs.

Seven session GUCs we lock per transaction

These have always worked. The planner reads them every query, no extension required. We set them with SET LOCAL inside the retrieval transaction so they revert at COMMIT and never leak into a pooled connection. The official reference lives in the PostgreSQL runtime-configuration docs.

7. enable_seqscan = off

The blunt instrument. We prefer the targeted NoSeqScan hint, but on long-tail queries against partitioned tables, the session switch is simpler to reason about and easier to revert.

8. max_parallel_workers_per_gather

Set to 4 for retrieval queries, 0 for the embedding ingest path. Parallel workers on an INSERT ... SELECT against a pgvector table will fight for the same WAL writer and double our write latency.

9. work_mem

64MB per retrieval transaction. The default 4MB forces the HNSW result heap to spill to disk. Caveat: work_mem is per sort or hash node, not per query. A query with three hash nodes uses 192MB.

BEGIN;
SET LOCAL work_mem = '64MB';
SET LOCAL max_parallel_workers_per_gather = 4;
SET LOCAL jit = off;
-- retrieval query here
COMMIT;

10. jit = off

JIT compilation costs 30 to 80 ms per query on our workload. The retrieval queries themselves run in 12 ms. The maths are easy.

11. plan_cache_mode = force_custom_plan

Prepared statements default to a generic plan after five executions. Our retrieval queries have wildly different parameter selectivity (one policy versus three hundred), so the generic plan is wrong about a third of the time. Forcing a custom plan adds 0.4 ms of planning overhead and removes the worst tail.

12. cursor_tuple_fraction = 1.0

If you stream results with a cursor (we do, for re-ranking) the default 0.1 tells the planner to optimise for the first 10% of rows. We want the full set ranked. Setting this to 1.0 gives us the all-rows plan.

13. random_page_cost = 1.1

The default is 4.0, which assumes spinning rust. On NVMe the cost ratio between random and sequential reads is closer to 1.05. Setting this to 1.1 makes the planner trust the HNSW index without us having to hint it on every call.

Six pgvector and safety-net knobs

14. hnsw.ef_search

The recall knob. Default is 40. We tuned to 80 for the insurer's claims agent, which lifted recall from 91% to 97.5% and added 11 ms of latency. Worth it for a claims workflow where the wrong clause is more expensive than a slower clause.

SET LOCAL hnsw.ef_search = 80;

15. ivfflat.probes

We keep an IVFFlat index alongside HNSW for the long-tail queries where HNSW recall collapses on out-of-distribution embeddings. probes=10 hits the right point on the recall and latency curve for our table.

16. effective_io_concurrency

Set to 200 on NVMe replicas. The default 1 tells the planner the storage is a single spindle. With 200, bitmap heap scans prefetch correctly and bitmap recheck on the embedding column drops from 80 ms to 18 ms.

17. statement_timeout

750 ms. Anything slower is not retrieval, it is a runaway plan. We catch the timeout in the application layer and fall back to a coarser keyword search. The customer sees a slightly less precise answer instead of nothing.

18. lock_timeout

50 ms. Retrieval should never block on a lock. If a maintenance job is rewriting the index, we fail fast and try a replica.

19. idle_in_transaction_session_timeout

2 seconds. The retrieval transaction wraps a single SELECT plus a handful of SET LOCAL calls. If it sits idle, something in the application logic has gone wrong, and we want the connection back in the pool before the pool starves.

BEGIN;
SET LOCAL idle_in_transaction_session_timeout = '2s';
SET LOCAL statement_timeout = '750ms';
SET LOCAL lock_timeout = '50ms';
SET LOCAL hnsw.ef_search = 80;
-- retrieval query
COMMIT;

Postgres 19 narrows the gap

The query-hints proposal landing in 19 covers the first six patterns above (IndexScan, NoSeqScan, HashJoin, Leading, Memoize, Parallel) as native syntax. The committers were explicit that this is a controlled subset, not a full port of pg_hint_plan's grammar. That is fine. Those six are the ones we cared about. The session GUCs in patterns 7 through 13 already work today on Postgres 12 and up. The pgvector knobs in 14 and 15 are extension-side and unaffected by the core change. So Postgres 19, for our workload, removes one shared-library dependency from every replica and lets us drop the pg_hint_plan deploy step.

It does not change the discipline. Hints are still a contract with the planner. Write them down, version them with the code, review them when the plan changes.

The five-minute audit

When we built the retrieval layer for this insurer's claims agent, the thing we ran into was that the planner's worst behaviour was invisible under p50. The whole problem lived in the long tail and only surfaced under concurrent load. We ended up wiring an EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) sampler that fires on 1% of retrieval calls and ships the plan to a column store. We catch plan flips within an hour now instead of when the support queue fills up. If you are running AI agents against a Postgres-backed knowledge base today, the smallest useful thing you can do this afternoon is wire that sampler. The first plan flip you catch will earn it back.

Key takeaway

Hints are a contract with the planner: write them down, version them with the code, and review every one of them whenever the plan changes.

FAQ

Do these hints work on Postgres 17 or only on Postgres 19?

Patterns 7 through 19 work on Postgres 12 and up. Patterns 1 through 6 need the pg_hint_plan extension on Postgres 17 or earlier, and land in core in Postgres 19.

Is HNSW always better than IVFFlat for a workload like this?

No. HNSW wins on average. IVFFlat with high probes wins on long-tail recall when the embedding distribution drifts away from training. We keep both indexes and route per query.

What did the latency look like before and after the cheatsheet?

The p99 went from 4.2 seconds during plan flips to a steady 240 ms. The p50 dropped from 180 ms to 95 ms after the work_mem, ef_search, and random_page_cost tuning.

Will SET LOCAL leak across a PgBouncer transaction-pooled connection?

No. SET LOCAL is scoped to the current transaction and reverts at COMMIT or ROLLBACK, which is exactly when PgBouncer releases the backend in transaction pooling.

ragarchitecturetoolingoperationsai agentsknowledge base

Building something?

Start a project