Chat agents

Magento 2.3 chat agent: 1,680 custom builds before 17:00

It's 16:42 in Apeldoorn. The Rotterdam workshop locks tomorrow's production slot in eighteen minutes. The chat agent has eight maatwerk-questions to answer first.

Jacob Molkenboer· Founder · A Brand New Company· 18 Jun 2026· 10 min

Oak counter with ivory order card, brass clip, green ribbon, brass bell, tally counter, pencil, wax seal.

It's 16:42 in Apeldoorn. The Rotterdam workshop locks tomorrow's production slot in eighteen minutes. The chat agent has eight maatwerk-questions queued, two of them about a fabric the shop discontinued in 2019. By 17:00 every customer either holds a delivery promise or sits in the planner queue with a flagged note. The team is 24 people. Last year the same 16:42 belonged to one increasingly tired customer-service lead with three browser tabs and a Magento back-office that takes nine seconds to refresh.

The stack, on paper, is one no consultant would touch: Magento 2.3.7 on PHP 7.2, a custom PHP fabric library, MySQL 5.7. The agent now handles 1,680 maatwerk conversations a week. None of them require a human to type the first answer. The interesting work was not the language model. It was the contract between the agent, the Rotterdam workshop calendar, and a thirteen-year-old database that nobody is ready to retire.

The stack we inherited

Magento 2.3 was retired by Adobe in September 2022; security advisories have been catalogued ever since on the Adobe Commerce security bulletins page. PHP 7.2 lost security support in November 2020, as documented on the PHP supported versions page. The owner already knew. He also knew that the site converted, that Stripe went through, and that the Rotterdam workshop knew which row in fabric_stock_extended mapped to which roll in which rack. The next eighteen months were not going to be about refactoring the cart. They were going to be about volume. We were not going to be the agency that broke a working shop to feel modern.

So the brief was narrower: take the maatwerk question — "can you build this sofa in this fabric in this size for delivery before date X" — off the customer-service team's queue, without touching the checkout, the planner, or the fabric library. Sidecar, not surgery.

What the agent actually does

The agent answers in Dutch. It reads three things: the product the customer is looking at, the fabric library (4,217 SKUs across silk, wool-blend, velvet, outdoor), and the Rotterdam workshop calendar. It writes back one of three things: a quote with a committed delivery week, a planner-queue ticket with a polite holding message, or a handoff to a human when the question is not about maatwerk at all (it is, twice a day, about a missing delivery, which is not what the agent is for).

The orchestration layer is a thin Node service running on the same VPS as Magento. It does not call Magento's REST API for stock — that was slow and lied about reserved rolls. It talks to MySQL directly, read-only, against the same tables the shop owner has been reading for thirteen years. The first iteration of the retrieval was Postgres plus pgvector. We threw it out in week two and rebuilt against MySQL's existing fulltext indexes, because the truth lived there and the latency was 40 ms instead of 220 ms.

The 45-second SLA

The hard number is 45 seconds. From the moment a customer sends their first message, the agent must either commit a delivery week or park the conversation in the planner queue. Not "ideally". Mechanically. The orchestrator has a budget timer and the model call is wrapped in a thinking-time guard. If the agent is still thinking at 38 seconds, it stops thinking and parks. The planner team would rather process fifteen extra tickets a day than have one customer waiting 90 seconds while the model hedges.

The reason the 45-second bound matters is the 17:00 cutoff. The workshop foreman in Rotterdam locks tomorrow's production slot at 17:00 sharp, Europe/Amsterdam time. Anything not in the planner queue with a confirmed fabric by then rolls into the next day, which moves the entire delivery week by one. Over a hundred conversations a Friday afternoon, the cumulative shift is brutal. The agent's contract with the workshop is: I will never make you wait, and I will never promise something I cannot park.

// rules/leadtime-park.ts
export async function decide(quote: Quote, ctx: Ctx) {
  const weeks = await leadTimeWeeks(quote, ctx);

  if (weeks > 8) {
    await ctx.planner.queue({
      quoteId: quote.id,
      reason: 'leadtime>8w',
      slot: 'next-available',
      lockBeforeUtc: ctx.cutoff(), // 17:00 Europe/Amsterdam
    });
    return {
      sayToCustomer:
        `We can build this. The soonest available slot is in ${weeks} weeks. ` +
        `Our planner confirms within one working day.`,
      handoff: 'planner',
    };
  }

  return {
    sayToCustomer:
      `Yes. Delivery in ${weeks} weeks if you confirm today before 17:00.`,
    handoff: null,
  };
}

The eight-week parking rule

Every levertijd-belofte above eight weeks goes into the planner queue. The agent does not negotiate this. The number is not magic; it is the point at which the workshop's confidence interval on a delivery date crosses 90%, based on three years of historical data the client handed us in a CSV the first week. Below eight weeks, the agent can commit. Above eight, only a human can.

What we learned was that the parking message is more important than the commit message. Customers who get "your custom build is in the planner queue, our team confirms within one working day" do not bounce. Customers who got the old "it might be 9-12 weeks, we'll come back to you" template abandoned at 31%. The agent's parking copy went through six rewrites before the abandonment rate on parked conversations dropped below 4%. The phrase that finally worked was the boring one: a fixed time, a named owner, and one concrete next step.

RAG over a thirteen-year-old fabric library

The fabric library is where most of the time went. 4,217 SKUs, of which about 1,100 are actively stocked. The rest are discontinued, archived, or available on order from a mill in Como with a six-week lead time. The customer types things like "die donkerblauwe linnen die mijn moeder vorig jaar besteld heeft" — that dark-blue linen my mother ordered last year. The agent has to map that to either an in-stock SKU, a discontinued SKU with a plausible alternative, or a mill-order with a fresh quote.

We tried pgvector first. It was fine. Then we noticed that the shop's existing MySQL fulltext index, combined with a hand-curated synonym list the customer-service lead had been maintaining since 2017 in a Google Sheet, beat the vector retrieval on top-1 accuracy by eleven points. The Google Sheet is now a database table, refreshed nightly by the same lead, who has stopped firefighting and started owning the synonym layer.

The reason was not the algorithm. It was the synonyms. Customers in Apeldoorn do not type SKU names. They type the colour their mother used in 2018, the texture a showroom assistant mentioned in March, the price band they vaguely remember. Just over three thousand of those mappings already lived in the sheet, built one customer call at a time across seven years. The vector store would have had to learn them. The fulltext index already knew them by name.

-- fabric_lookup.sql — called from the retriever per turn
SELECT
  f.sku,
  f.name_nl,
  f.composition,
  f.width_cm,
  f.rub_count,
  f.discontinued_at,
  s.rolls_available,
  s.rack_location
FROM fabric_stock_extended f
LEFT JOIN fabric_stock_live s
  ON s.sku = f.sku
WHERE
  MATCH(f.name_nl, f.search_blob)
    AGAINST (? IN NATURAL LANGUAGE MODE)
  AND (f.discontinued_at IS NULL
       OR f.discontinued_at > NOW() - INTERVAL 24 MONTH)
ORDER BY
  (s.rolls_available > 0) DESC,
  MATCH(f.name_nl, f.search_blob) AGAINST (?) DESC
LIMIT 12;

The 24-month discontinued window matters: customers who reference a fabric "from a year or two ago" almost always mean something the shop sold within that window. Anything older returns through a separate archival path with a clear caveat in the customer-facing message. There is no point retrieving a 2014 silk to a 2026 buyer.

What broke in week two

Two things broke in week two. The first was that the agent confidently promised a sofa in a fabric the workshop had refused to use for upholstery since 2021, because the Martindale rub count was too low for seating. The fabric library had the rub count but did not flag it as upholstery-unsuitable; the rule lived in the foreman's head. We fixed it the way we always fix this kind of thing: the foreman dictated the rules, a junior engineer wrote them as a YAML file in the agent's context, and the foreman now reviews the file once a month over a coffee. There are 31 rules. They are extremely boring. They are the most valuable artefact in the project.

A sample, paraphrased: no Martindale below 25,000 rubs on a seat cushion. No linen as outdoor upholstery, even if the customer insists. No mixing two woven patterns on the same frame without a showroom photo signed off by the foreman. No velvet on a daybed with a dog in the household. The rules do two jobs at once. They keep the agent from promising builds the foreman would have to refuse on the floor, and they make tacit workshop knowledge legible to a junior engineer who joined six weeks ago and has never seen the rack of discontinued silks.

The second thing was MySQL replication lag on the read-replica we had pointed the agent at. Under load the agent occasionally quoted a roll that had been reserved 90 seconds earlier by another conversation. We moved the stock read to the primary, accepted the 8 ms penalty, and added a 15-second reservation hold the moment a delivery promise lands. No double-bookings since.

Warning

If your agent reads stock from a replica, your numbers will lie under load. The fix is not "tune the replica" — it is "do not read stock from a replica, ever".

The numbers after eleven weeks

Eleven weeks in, here is what we measured. The agent handles 1,680 maatwerk-conversations a week, up from a manual baseline of 410 the same team could process. The 17:00 cutoff has been missed twice in eleven weeks, both times because of a Stripe outage that blocked the deposit step (not the agent's fault, but the customer-service lead still bought us a stroopwafel and a stern look). 71% of conversations end with a commit message; 23% with a planner-queue parking message; 6% with a handoff to a human. Of the parked conversations, the abandonment rate is 3.8%, down from 31% on the old template.

Customer satisfaction, measured on a one-to-five star prompt sent after delivery, has moved from 4.1 to 4.4. The model bill, billed per conversation, is around twenty cents on a good day and forty on a bad one — call it €40 to €60 a week across all 1,680 conversations. The customer-service lead now spends her afternoons curating the synonym table and reviewing the foreman's rule file. She has stopped working past 17:30. She told us, with the appropriate amount of Dutch suspicion, that she had expected to hate the project.

What we did not build

We did not migrate Magento. We did not touch PHP 7.2. We did not replace the planner. The owner has a private timeline for all three, and when he is ready, we will be there for it. But this project is a reminder that a chat agent does not require a modern stack underneath it. It requires a clean contract with the systems already running the business.

The same week we shipped, an open-source AI CAD tool called Adam launched on Hacker News and pulled 188 points to the front page. For a maatwerk-shop, the gap between "I want a sofa in this fabric at this size" and a workshop-ready spec is still half-manual. The chat agent closes one end of that gap. CAD agents will close the other. That is a problem worth watching for the next phase of this client's roadmap.

When we built this agent for the Apeldoorn furniture shop, the thing we ran into was that the fabric library knew everything about the fabrics and nothing about how they were used. We solved it with a YAML rules file the foreman owns and a synonym table the customer-service lead owns — the kind of small, durable structure that turns a legacy database into something an AI agent can actually act on.

The smallest thing you could try this week

Pick the one question your customer-service team answers most often. Write the contract — input, output, acceptable latency, what to do when the answer is "no" — on a single A4 page, in prose. If your stack is fifteen years old, write the contract against the database tables you already trust, not the API you wish you had. Then build the agent against that contract, not against the model. The model is the cheap part.

Key takeaway

A chat agent does not need a modern stack underneath it. It needs a clean contract with the systems already running the business.

FAQ

Why didn't you migrate Magento 2.3 before building the agent?

The owner's eighteen-month roadmap is about volume, not refactor. The site converts, payments go through, and the workshop knows the database. We built the agent against the running stack rather than break a working shop.

Can the agent overpromise on delivery dates?

No. Anything above eight weeks is parked in the planner queue for a human to confirm. The eight-week boundary is the point at which the workshop's historical confidence on a delivery date crosses 90%.

Why MySQL fulltext instead of a vector database?

We tried pgvector first. The shop's existing fulltext index plus a hand-curated synonym table beat it on top-1 accuracy by eleven points and ran at 40 ms instead of 220 ms. The truth already lived in MySQL.

What happens if the model is slow?

The orchestrator has a 45-second budget. At 38 seconds it stops thinking and parks the conversation in the planner queue with a holding message. The workshop would rather get extra tickets than make a customer wait.

chat agentsai agentsmagentophplegacy sitescase study

Building something?

Start a project