← Blog

Chat agents

Picking a product Q&A chat agent: the SME stack method

How a sub-€20M Dutch shop should choose between Shopify-native AI, self-hosted Inkeep, and a custom Claude Agent SDK build when 9,400 product questions arrive every month.

Jacob Molkenboer· Founder · A Brand New Company· 21 Aug 2025· 9 min
Three brass scales on ivory desk, green ribbon on smallest scale, linen swatch, soft window light.

It is Tuesday afternoon. You run operations for a Dutch online shop that did €8M last year in specialty cookware. The CS inbox just crossed 9,400 product questions a month, and your two reps spend the day answering the same six things: induction compatibility, pan diameter, whether you ship to Belgium, the return window, gift-wrap, and whether the new lids fit the older series. The CEO read a LinkedIn post about Shopify Sidekick over the weekend. Your CTO sent over an Inkeep demo. A Slack DM from your agency proposed "just build it on the Claude Agent SDK." Peak Q3 starts in three weeks, and the board wants a chat agent live before it does.

Three weeks is enough to ship one of these. Picking the wrong one is enough to lose the season. Here is the method we use when a sub-€20M Dutch shop has to choose between a Shopify-native stack, a self-hosted Inkeep deployment, and a custom build on the Claude Agent SDK.

The three options that survive a first pass

You can buy a chat agent off a hundred app stores. Most of them are a thin wrapper around a vector store and a fixed prompt. For a real product-Q&A workload at 9,400 threads a month, three architectures actually compete.

Shopify-native AI. Sidekick handles merchant automation inside the admin. On the storefront, the Shop Channel and the Inbox AI layer cover customer Q&A. Setup is hours, not weeks. The trade-offs are real: you cannot pick the model, you cannot ship a Dutch tone you actually like, and your conversation trace lives behind Shopify's UI.

Inkeep + Postgres, self-hosted. Inkeep is built for product Q&A out of the box: retrieval, citations, fallback to humans. Self-hosted on a small EU VPS with Postgres for traces and pgvector for the index keeps the data inside the union. Setup is two to three weeks if your product catalog is already clean, longer if it is not.

Custom Claude Agent SDK build. You write the loop. You define the tools: price lookup, stock lookup, return policy, shipping calculator, induction compatibility. You own the trace store and the viewer. The ceiling is high; the time-to-live is four to eight weeks. This is the option you reach for when product reasoning is the hard part, not retrieval.

The per-conversation math at 9,400 threads

Pitch decks always quote per-conversation cost. The decks are wrong, because they leave out infra and the first-month spike when the bot is wrong about everything.

Honest steady-state numbers for 9,400 conversations a month, average trace length of six to eight turns, one or two tool calls each:

  • Shopify-native: bundled in your plan. Marginal cost per conversation is effectively zero. The real cost is the Shopify Plus upgrade if your current tier does not include the AI layer, which is a fixed line item that does not scale with conversation volume.
  • Inkeep self-hosted: infra around €250 to €400 per month for a small VPS, managed Postgres, and the vector index. Model spend, with prompt caching and a mix of a cheap classifier and Claude Sonnet 4.5 for the answer, lands near €0.03 per conversation. Total: roughly €0.06 per conversation at this volume.
  • Custom Claude Agent SDK: infra €150 to €300 per month. Model spend €0.04 to €0.07 per conversation depending on how many tool calls you allow. The number nobody writes down is engineering time. Amortise eight weeks of senior build work over twelve months and you are looking at €1.50 to €3 per conversation in year one, then it falls off a cliff in year two.

If you only look at the steady-state column, the answer is always "Shopify-native, it is free." If you look at year one, Inkeep wins by a wide margin. If you look at the customer-satisfaction column, custom often wins, because the agent can actually reason about your catalog instead of guessing from a snippet. Pick the column your CEO will measure.

Return rate impact in the first 90 days

Product-Q&A agents reduce returns the way good station signage reduces train delays: by setting expectations before someone commits. The mechanism is boring. A customer asks "will this pan work on induction?", the bot says yes or no with a citation to the spec sheet, the customer buys the right pan or does not buy, and you do not ship a wrong pan to Maastricht and back.

With a small N of NL e-com clients, we have seen pre-purchase Q&A traffic that gets a correct, sourced answer correlate with a 6% to 12% drop in the return rate on the SKUs the bot covered, measured over 90 days against a baseline of the prior quarter. The first two weeks are usually noisy and sometimes negative, because the bot is confidently wrong about one specific thing and customers act on the wrong information. Plan for that.

Warning

If your bot is wrong about return policy or shipping cost in the first month, returns will spike, not drop. Gate those two topics behind a human handoff until you have logged 200 traces and a CS lead has read them.

The 90-day return-rate improvement is the metric your CEO will ask about. The metric your finance lead will ask about is gross-margin recovery: a 6% drop in returns on a SKU with €40 round-trip shipping-and-handling cost is €2.40 per averted return, times every return you avoid.

Who reads the trace when a customer claims free shipping

This is the question most architecture write-ups skip. It is also the question that decides whether your legal team lets you ship the project.

Under Dutch consumer law, statements your agent makes to a buyer can bind you to the offer, the same way a sales rep promising a phone discount binds the company. If a customer in Antwerp screenshots a chat where the bot said "shipping to Belgium is free," someone needs to read the trace, decide if the screenshot is real, and decide whether to honour it. By Friday. Often before the customer files a chargeback with their card issuer.

That is a workflow question, not a model question. Score each option on it.

  • Shopify-native: the trace lives in the Shopify admin. CS reps can see it. Export is limited. If you need to hand a trace to your lawyer, you screenshot it.
  • Inkeep self-hosted: the trace lives in your Postgres. You decide who can SELECT on it. Your CS lead can have a small internal dashboard with read-only access, and you can find every trace that mentions "shipping" or "verzending" in under a second.
  • Custom Claude Agent SDK: you built the trace store, so you build the viewer. This sounds like a downside but it is often where custom wins: your CS lead opens a Linear-style inbox of flagged traces, not a generic chat log.

If you keep the trace data yourself, GDPR follows, and GDPR means a retention policy. The cheapest policy to maintain at this volume is monthly partitions in Postgres, dropped on schedule. There was a popular thread on Hacker News this week arguing the only scalable delete in Postgres is DROP TABLE, which is exactly the lever you want for a chat-trace store: keep three months hot, drop the rest atomically with a partition swap. The Postgres partitioning docs are the right starting point.

CREATE TABLE chat_trace (
  id          bigserial,
  thread_id   uuid        NOT NULL,
  created_at  timestamptz NOT NULL,
  role        text        NOT NULL,
  content     text        NOT NULL,
  tool_name   text,
  tool_args   jsonb,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

CREATE TABLE chat_trace_2026_06 PARTITION OF chat_trace
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

CREATE INDEX ON chat_trace_2026_06 (thread_id);
CREATE INDEX ON chat_trace_2026_06 USING gin (to_tsvector('dutch', content));

-- Three months later, drop the whole month in O(1):
DROP TABLE chat_trace_2026_03;

Five lines of SQL. The difference between a happy DBA and a 4am incident in month thirteen.

The decision matrix we actually use

Score on five dimensions, weighted to your situation.

  1. Time to first useful trace (days). Shopify-native wins this by a week.
  2. Per-conversation steady-state cost (euro). All three converge in year two; only the cap-ex differs.
  3. Catalog reasoning ceiling (how weird your product questions get). Custom wins this for anything with specs, compatibility, or configuration. Generic agents are fine for fashion and gifts; they struggle with cookware, electronics, and B2B parts.
  4. Trace accessibility (who can read the bot's mouth on Friday at 3pm). Inkeep and Custom win, by a wide margin.
  5. Compliance posture (where the data sits, who can be subpoenaed). Self-hosted EU wins, with a paper trail you can hand a regulator.

Multiply each score by how much your CEO actually cares about that row. For most sub-€20M NL shops we have worked with, the order ends up: Inkeep on a small VPS first, custom Claude Agent SDK when the catalog reasoning is non-trivial, Shopify-native when you genuinely have three weeks and a catalog that fits a generic agent. We rarely pick the last one for shops above €5M. Pick by who reads the trace on Friday at 3pm, not by who has the slickest demo on Tuesday.

The five-minute audit you can do today

Open your CS inbox. Tag the last 200 questions by topic. If 60% of them are answerable from your product spec sheets and your shipping page, an agent will pay for itself by week six. If 60% of them are "where is my order" chasing PostNL, an agent will not solve your problem; a better order-status page will.

When we built the product-Q&A agent for a Dutch cookware shop last quarter, the thing we ran into was that the catalog had eleven different lid-to-pan compatibility rules buried in three places, and no human had ever seen them as one list. We ended up extracting them into a small Postgres table the agent could query as a tool, which is the kind of plumbing that decides whether building AI agents for shops this size actually saves anyone time.

Key takeaway

Pick your product-Q&A chat-agent stack by who reads the trace on Friday at 3pm, not by whose demo looks slickest on Tuesday.

FAQ

How long does each chat-agent stack take to ship?

Shopify-native is hours to days. A self-hosted Inkeep deployment is two to three weeks if your catalog is clean. A custom Claude Agent SDK build is typically four to eight weeks for a careful first version.

What does a product-Q&A conversation cost at 9,400 threads a month?

Roughly €0 marginal for Shopify-native, near €0.06 for Inkeep self-hosted including infra, and €0.04 to €0.07 for a custom Claude Agent SDK build, before amortising engineering time.

Should we keep every chat trace forever?

No. Set a retention window (90 or 180 days is typical for product Q&A), partition the table by month, and drop old partitions on schedule. Long retention is a GDPR liability, not an asset.

When does Shopify-native AI run out of room?

When your products have compatibility rules, configurations, or specs the buyer cares about. Cookware, electronics, and B2B parts all hit the ceiling fast; fashion and gifts rarely do.

ai agentschat agentse-commercearchitectureragstrategy

Building something?

Start a project