← Blog

Strategy

AI chat widgets are the wrong first deployment: three fixes

A chat widget on a sub-€5M site gets four messages a day, three of them spam. We have stopped recommending it as a first AI deploy. Here is what we do instead.

Jacob Molkenboer· Founder · A Brand New Company· 21 Jul 2024· 6 min
Brass call bell on ivory paper with green index card under base and cream envelope behind, left side of frame.

The chat widget that never rang

A founder in Utrecht showed us his SaaS dashboard last month. €1.4M ARR, twelve people on the team, a chat widget in the corner of the marketing site that he had paid €299 a month to make "AI-powered" the previous October. He pulled up the analytics. Four conversations the previous day. Three were spam. The fourth said "how much does this cost." The bot replied with a link to the pricing page. The founder had spent eight months iterating prompts for a system that handled less work than his autoresponder.

We have built fourteen agents in production for clients between €500k and €50M in revenue. For most of them, the chat widget is the first thing they ask us about. And for most of them, we now say no.

Why volume matters more than vision

A chat widget is a synchronous surface. It only works if the visitor is on the page right now, with a question they want answered immediately, and enough trust in the bot to try it instead of leaving the tab open until business hours. For a B2C site with a million monthly visitors, that adds up to thousands of useful conversations a month. For a sub-€5M B2B SaaS or services site, the same widget gets four messages a day. Two from competitors, one from a job seeker, and one from a buyer who would have emailed anyway.

Intercom built a category around chat-first support. Their AI agent Fin is good. It is also priced and tuned for companies with the conversation volume to make it pay back. If you do not have that volume, you do not have the data to tune the agent, the patience to debug it, or the wins to justify it on the next board call.

The deeper problem is that a chat widget eats your best surface area, the homepage hero, with a feature most visitors will never see. Meanwhile the real bottlenecks at a sub-€5M company are nowhere near the front door.

Surface one: the shared inbox

Every company between €500k and €5M has an inbox like info@, sales@, or hello@. It is shared between two and five people. It receives between forty and four hundred messages a day. About 60% is noise (recruiter spam, newsletter pitches, cold outbound). 25% is in-progress work that needs to be routed. 10% is genuinely new buyer intent. 5% is hidden urgency (a stuck invoice, a churn risk, an angry partner) that should have been flagged hours ago.

An inbox agent is the chat widget you wish you had. It runs against a place where you actually have volume. It labels and routes mail to the right person. It drafts replies that the team approves before send. It flags the four messages that need a human in the next thirty minutes. And it works at 3 a.m. on a Saturday without anyone watching it.

This is async by nature. Senders do not expect a reply within ten seconds. They expect one within a business day. That gives the agent room to ask clarifying questions, route, summarise, and let humans approve the borderline cases.

Surface two: outbound chase

The second surface lives behind the company, not in front of it. It is the work nobody on the team likes doing: chasing overdue invoices, following up with dormant leads, pinging customers whose subscription charges just failed.

One operations lead we work with had a spreadsheet of 412 overdue invoices when we met her. Her chase process was a once-a-week half day, which meant the average invoice was 23 days late before it got its first reminder. We replaced the half-day with an agent that drafts a polite chase email at the 7, 14, and 28-day marks, pulls the invoice PDF and the original PO, and waits for her to approve send. The average collection cycle dropped to nine days.

The same shape works for any "we should follow up" task: dormant trial users, expired contracts, payment-method warnings, NPS-detractor outreach. The agent drafts. A human approves. Send goes out.

Takeaway

The best first AI deploy is not the one your customers see. It is the one your team would have done by hand on Friday afternoon.

Surface three: the team knowledge desk

The third surface is internal. Every company we work with has a folder full of PDFs, Notion pages, contracts, onboarding docs, runbooks, and Slack threads that contain answers to questions the team asks every day. "What is the warranty policy for B2B customers?" "Which agency handled the SEO migration last year?" "What did we agree with the German distributor about returns?"

A knowledge agent in Slack or Teams answers these in five seconds with citations to the source doc. It is not customer-facing. It does not need a polish on the response style. It does not need to be funny or on-brand. It needs to be right, traceable, and available where the team is already working.

This is also where the technical risk is lowest. The blast radius of a bad answer is small (a colleague reads the cited doc and corrects the agent). The volume of useful questions is high (a 12-person team will hit 30 a day once they trust it). And the work it replaces (someone on Slack pinging the founder at 11pm to find a contract) is real money.

The async dividend

All three surfaces share a property the chat widget does not. The user is not waiting on the agent. There is a buffer of seconds or minutes between request and response, which means the agent can think, check, retrieve, draft, and ask a human to verify before anyone reads its answer.

This is the difference between an agent that "works in our demo" and one you keep in production. The async surface forgives slow tools, retried tool calls, ambiguous queries that need clarification, and the occasional "I am not sure, escalating to a human." The chat widget does not. Every one of those failure modes shows up as a frustrated visitor.

Warning

If your only AI deployment is the one your prospects see first, you are betting your brand on a surface where every failure is public.

The Hacker News crowd has been making a related point louder this year. The skeptics are not wrong that a lot of AI deployments are cosmetic. They are wrong that this means AI deployments do not work. The pattern is just: pick the surface where the cosmetic version fails most visibly (the live chat), pick the surfaces where the boring version pays back hardest (the inbox, the chase, the knowledge desk), and you have your roadmap.

Where to start tomorrow

The five-minute audit is this. Open your shared inbox. Count how many messages came in yesterday. Count how many got a reply within four business hours. Count how many fell through and got a reply two days later. The gap between those last two numbers is the first agent's job description.

When we built the inbox agent for a Dutch logistics SME, the thing we ran into was that 30% of inbound mail was drivers attaching shipping photos that needed to be linked to the right consignment. Off-the-shelf email triage tools missed this entirely. We ended up training a small classifier on six months of historical mail and routing the photos to the operations folder before the agent ever drafted a reply. If you want the same shape of work on your stack, the AI agents page has the playbook.

Key takeaway

The first AI deploy that pays back is the one your team would have done on Friday afternoon, not the one your customers were going to see.

FAQ

What about chat widgets for high-traffic B2C sites?

They still work where volume justifies them. The case against is specifically for sub-€5M B2B and service companies whose sites do not hit the conversation count needed to tune and trust the agent.

Won't an inbox agent send the wrong reply?

It drafts, a human approves. Async email gives you the seconds you need to check. We hold ours in draft-and-review mode until roughly 90% of the team's edits are stylistic rather than factual.

How long does a first async agent take to ship?

Four to six weeks for an inbox triage agent. Two to four weeks for an invoice chase agent if the billing data lives in one system. Internal knowledge agents are often live in two weeks.

Do we need a separate model or can we use one we already pay for?

Most teams already have a usable model via their CRM, helpdesk, or office suite. The work is rarely in picking the model. It is in scoping the surface, wiring the tools, and tuning the approval loop.

ai agentschat agentsemail automationprocess automationstrategyoperations

Building something?

Start a project