E-commerce

E-commerce search: how we pick Algolia, Typesense, or pgvector

How we score Algolia, Typesense, and a self-hosted pgvector hybrid when a Dutch e-commerce brand under €25M revenue needs product search that actually moves checkout.

Jacob Molkenboer· Founder · A Brand New Company· 10 Mar 2025· 6 min

Three twine-tied paper parcels, brass scale tipped, green postcard and red wax seal on ivory paper.

Tuesday afternoon. A Rotterdam fashion brand's catalog manager types "wijnrode jurk maat 38" into the staging search box and gets zero hits. The catalog has 47 wine-red dresses in size 38. The search index does not know "wijnrode" is the Dutch compound for "wine red," does not split "maat 38," and treats the whole query as one literal token.

This post is the spreadsheet we open when a brand under €25M asks us which of three search stacks to ship: Algolia, Typesense, or a self-hosted Postgres setup combining full-text search with pgvector for semantic recall. We score on three axes that survive real traffic: relevance lift on the catalog manager's worst 200 queries, who owns the synonym list when it grows past 800 entries, and whether the search request still completes inside a Dutch 3G window when a Galaxy A04 is on the IC train to Utrecht.

The three options at face value

Algolia is the hosted SaaS most teams already know. You ship a JSON index, you pay per search and per record, and you get a tuned ranking algorithm out of the box. Typesense is the open-source contender that runs as a Go binary you can deploy yourself or buy as Typesense Cloud. The pgvector route is the one you build: Postgres for the catalog, tsvector for keyword search, pgvector for embeddings, and a hybrid scorer that blends the two.

None of these is the right answer by default. The pick is downstream of the brand's headcount, catalog volatility, and where their customers actually shop from.

The scoring sheet

We score each option from 1 to 5 on six dimensions and weight the three the brand told us matter most. The dimensions are relevance lift, synonym ownership, p95 latency at 3G, cost at projected query volume, operational complexity, and data residency. A perfect score is 30. Anything under 18 fails the audit and we recommend nothing.

Dimension                Weight  Algolia  Typesense  PG hybrid
Relevance lift (200q)    25%     5        4          3
Synonym ownership        15%     3        4          5
p95 latency at 3G        20%     4        4          3
Cost at 200k queries/mo  15%     2        4          5
Operational complexity   15%     5        3          2
Data residency (EU)      10%     3        4          5

Those numbers are the median of the last eight audits we ran. They shift per brand. A 6,000-SKU jewellery store with one part-time merchandiser scores differently than a 180,000-SKU industrial parts catalog with a dedicated data team.

Relevance lift on the worst 200 queries

The honest measurement looks like this. Pull the brand's last 90 days of internal search logs. Sort by zero-result and one-click-and-bounce queries. Take the worst 200. Hand them to a junior merchandiser and ask them to mark, for each query, what the right top-three results should be. That is the ground truth set.

Then run each search stack against the same 200 queries and measure nDCG@3 against the ground truth. We have yet to see Algolia score lower than 0.78 on a Dutch catalog when the synonym dictionary is in place. Typesense lands around 0.72 with stock configuration and 0.81 once you add Dutch stemming and a curated synonyms file. The pgvector hybrid scores 0.69 cold and climbs to 0.83 once you tune the keyword-vs-embedding blend, but the climb takes roughly two engineer-weeks and a labelled training set.

Takeaway

Cold-start relevance favours Algolia. Ceiling relevance, once you put the work in, favours the pgvector hybrid. Typesense is the option that gets you within 5% of either without the bill or the engineering cost.

Who owns the synonym list

This is the dimension most vendor decks skip. A Dutch fashion brand will accumulate 600 to 1,200 synonym rules in the first year. "Wijnrood" maps to "bordeaux" and "donkerrood." "Sneakers" maps to "gympen" and "schoenen sport." "Maat 38" maps to "EU 38" and "UK 5.5." Whoever owns this list, owns the search.

With Algolia, the synonym dashboard is in the admin UI. The catalog manager can edit it without a deploy. Good for velocity, brittle for change tracking, and the export format is proprietary. With Typesense, synonyms live in a JSON file you commit to git. Good for change tracking, bad if your catalog manager does not work in git. With the pgvector route, synonyms are rows in a Postgres table you read at query time. You build the editing UI yourself, which means you also own the workflow.

The right answer here is the one the brand can actually maintain. We have seen Typesense fail in production at a brand whose merchandiser refused to learn git, and pgvector fail at a brand who never built the admin UI, so the synonym table grew stale and search quality decayed in silence.

Conversion at 3G

The KPN coverage map says 5G is everywhere. The catalog manager who tests the site on the IC from Amsterdam to Groningen knows otherwise. Real Dutch mobile traffic still drops to HSPA+ and slow 4G in stretches of the Randstad, all of Zeeland, and most of the north. If your search request takes 1.4 seconds at p95 over throttled 3G, you lose the checkout from the train.

Algolia's CDN edge in Frankfurt gives us a p95 of 180ms to 240ms to a Galaxy A04 throttled to "slow 3G" in Chrome DevTools. Typesense Cloud on the EU node lands at 220ms to 290ms. A self-hosted pgvector setup running on the brand's own VPS in Amsterdam, behind a Cloudflare cache, lands at 280ms to 380ms cold and 90ms warm. The pgvector option is the fastest when the result is cached and the slowest when it is not.

What this means for conversion: we measured a 6.1% lift in add-to-cart rate at one client when we cut search response p95 from 520ms to 210ms. The cause was not search quality. It was the keyboard not freezing while the user typed.

Data residency and the retention question

One thread we now ask in every audit: where does the query text go, and who keeps a copy. Internal search queries leak intent that the product detail page never does. A shopper who searches for "valpartij hulpmiddel" tells you something a category-tree click would not. Algolia's EU region keeps queries in Frankfurt and Paris and is GDPR-aligned for processor terms, but the search analytics stream is logged on Algolia infrastructure. Typesense self-hosted leaves nothing outside the brand's VPC. The pgvector route is fully on-premise if you want it to be.

This matters most for brands selling regulated goods, anything age-gated, or B2B catalogs where the search query itself reveals procurement intent. For a general fashion brand it usually does not. Score it accordingly.

When each option wins

Algolia wins when the brand has a small team, a stable catalog under 50,000 SKUs, and the cash to pay €600 to €2,400 per month without flinching. The dashboard runs itself, the relevance is great cold, and the catalog manager edits synonyms over coffee.

Typesense wins when the brand has one engineer who can run Docker, a catalog manager who tolerates a git workflow, and a catalog between 20,000 and 300,000 SKUs. It is the option that scales linearly with hosting cost rather than with query volume.

The pgvector hybrid wins when the brand already runs Postgres at scale, has at least one search-curious backend engineer, and a catalog with rich attribute data the embeddings can learn from. The ceiling is the highest. The runway to get there is the longest.

When we built the product-discovery search agent for a Dutch home-goods brand, the synonym table went stale within four months and we ended up shipping a small admin UI on top of the Postgres synonyms table so the merchandiser could edit it from the same dashboard she used for stock levels. The migration to pgvector was the easy part.

The smallest thing you can do today

Open your shop's internal search analytics. Sort queries by zero-result rate over the last 30 days. Read the top 50 out loud. If more than ten of them are spelling variants, compound words, or synonyms the index does not know, you have a synonym problem before you have a vendor problem. Fix that first and you may not need to migrate at all.

Key takeaway

Cold-start relevance favours Algolia. Ceiling relevance favours pgvector. Typesense is the option that gets you within 5% of either without the bill or the engineering cost.

FAQ

Why not just default to Algolia for every brand?

Cost scales with query volume, not catalog size. Past 200k searches per month, monthly bills cross €2k. Brands under €25M revenue can usually find that money or find better engineering hours.

Does the pgvector hybrid actually beat Algolia on relevance?

Only after two engineer-weeks of tuning the keyword-vs-embedding blend and labelling a ground truth set. Cold-start, Algolia wins. Tuned ceiling, pgvector wins. Most brands never reach the ceiling.

Can a non-technical catalog manager edit Typesense synonyms?

Not directly. Synonyms live in a JSON file you commit to git. If your merchandiser does not work in git, you need to build an admin UI on top, which erases the cost advantage versus Algolia.

How much does search latency actually move conversion?

At one client, cutting search p95 from 520ms to 210ms on throttled 3G lifted add-to-cart by 6.1%. The cause was not relevance. It was the keyboard not freezing while users typed.

Is EU data residency a real concern for a Dutch shop?

For general fashion or retail, rarely. For regulated goods, B2B procurement catalogs, or age-gated products, yes. Self-hosted Typesense or pgvector keeps queries inside your own VPC.

e-commercearchitecturetoolingintegrationsstrategy

Building something?

Start a project