Chat agents

Telegram agent for a crypto-tax firm: triage without advice

A Dutch crypto-tax firm came to us in February with a Telegram inbox that broke every March. We built an agent that answers what it can and hands off what it cannot.

Jacob Molkenboer· Founder · A Brand New Company· 17 Jun 2024· 9 min

Brass desk bell, folded telegram slips with linen twine and chartreuse tab, leather blotter, wax seal on ivory paper.

On a Sunday night in late February, the managing partner of a Dutch crypto-tax firm forwarded us a screenshot. It was their main Telegram group: 312 unread messages, three pinned reminders about CSV exports, and a single question buried at message 244 that read, in Dutch, "Do I have to report a bridge between two L2s?"

Filing deadline was May 1st. The firm had fourteen people. Roughly 600 clients, almost all of whom preferred Telegram to email. The partner had spent his Saturday morning scrolling, not advising. He asked the right question: "Can a bot read this for us, but not answer the real ones?"

That is the post. We built that bot. It went live four weeks later. Here is what is in it and what we got wrong.

The shape of a tax-firm inbox

The first thing you do with any support channel is read it. Not skim. Read every message for a week. We did that for this firm and tagged each one. The split was not subtle.

About 64% of the messages were not tax questions at all. They were variations on five things:

"Did you get my export?"
"Which exchanges do you support?"
"What's the deadline?"
"Can I drop off documents Friday?"
"Status of my filing?"

The remaining 36% were genuine tax questions. Bridging ETH from L1 to an L2. Loss harvesting on a delisted token. Whether a wrapped-asset position triggers a taxable event on unwrap. Whether an airdrop is income in the year it lands or the year it vests.

Those questions need a registered belastingadviseur. Not a model. Not a model with a confidence threshold. A human, with insurance, who has signed off on the answer. Dutch tax-advice rules and the firm's professional liability insurance both require it.

The job was never "answer everything." The job was "answer the 64%, recognise the 36% in under three seconds, and get a human into the chat fast."

Inverting the usual goal

Most chat-agent projects try to maximise auto-resolution. For a regulated firm, that target is dangerous. We inverted it. The bot's job, in order:

Answer the boring questions well.
Recognise a regulated question instantly.
Acknowledge in Dutch within two seconds.
Get a registered adviser into the conversation within 90 seconds.
Never produce text that looks like advice on a regulated topic.

"Never" is doing work in that list. We will come back to it.

The classifier sits in front of everything

We don't run one big prompt. There is a small classifier in front, and the rest of the system is downstream of its decision. Every inbound Telegram message goes through it before anything else happens.

Four buckets, with the default set to hand off:

LOGISTICS: deadlines, office hours, document status. The bot answers from a small knowledge base.
DOCUMENT: client is sending a CSV, screenshot, or voice note. The bot acknowledges, files the artefact, and tells the client when an adviser will look at it.
ADVICE_NEEDED: anything that touches the tax treatment of a transaction, position, or strategy. Hand off.
AMBIGUOUS: model is not confident. Hand off.

The classifier prompt is shorter than you would think. The structured-output schema does most of the work.

// classifier.ts
import { z } from "zod";

export const Classification = z.object({
  bucket: z.enum(["LOGISTICS", "DOCUMENT", "ADVICE_NEEDED", "AMBIGUOUS"]),
  confidence: z.number().min(0).max(1),
  one_line_reason: z.string().max(140),
  detected_topics: z.array(z.string()).max(5),
});

export const SYSTEM_PROMPT = `
You triage inbound Telegram messages for a Dutch crypto-tax firm.
You do not answer the message. You only assign one of four buckets.

Bucket rules:
- LOGISTICS: deadlines, opening hours, document status, contact info.
- DOCUMENT: client is sending or referencing a file, screenshot, voice note.
- ADVICE_NEEDED: anything about tax treatment, reporting obligations,
  cost basis, gains/losses, specific transactions, or "what should I do".
- AMBIGUOUS: anything else, or when in doubt.

If confidence is below 0.85 for any non-ADVICE bucket, return AMBIGUOUS.
The cost of a false LOGISTICS is high. The cost of a false ADVICE_NEEDED is low.
`;

That last line is the whole design philosophy. We over-route to humans on purpose. The firm would rather their advisers see ten extra logistics questions a day than have the bot answer one regulated question.

Warning

If you are building a chat agent for a regulated firm, set your default bucket to "hand off to human" and write the prompt so the model has to argue its way out of that default. Most teams do the opposite, and most teams ship bots that occasionally give advice they should not.

Acknowledge in two seconds, hand off in ninety

The handoff matters more than the classification. The classifier is the easy part. Getting a human into the conversation fast, without making the client feel like they are being shuffled, is the hard part.

When a message lands in ADVICE_NEEDED or AMBIGUOUS, the bot does five things in sequence:

Within two seconds, replies in the Telegram thread: "Een adviseur kijkt hier nu naar. Een moment."
Posts a structured ticket into the firm's internal Slack with the message, the client name, the last five messages of context, the classifier's reasoning, and a one-line summary.
Pings the adviser on duty for that client. Each client has a primary and a backup adviser.
If no human has reacted in Slack within 90 seconds, pings the backup.
If still no human at 180 seconds, posts a softer fallback in the Telegram chat: "Het duurt iets langer dan normaal. Iemand reageert binnen vijftien minuten."

The 90-second number is not magic. We tested 60, 90, and 120. Sixty was too tight, advisers were getting pinged while still on a call. One-twenty felt too long to the partner. Ninety landed.

// handoff.ts
async function handoff(msg: TelegramMessage, c: Classification) {
  await telegram.sendMessage(msg.chat_id, ACK_NL);

  const ticket = await slack.postTicket({
    channel: SLACK_TRIAGE,
    client: await lookupClient(msg.from),
    context: await lastNMessages(msg.chat_id, 5),
    classifier: c,
    summary: await summarise(msg.text, { maxChars: 180 }),
  });

  await pingAdviser(ticket.primary, ticket.url);

  const claimed = await waitForClaim(ticket.id, 90_000);
  if (!claimed) {
    await pingAdviser(ticket.backup, ticket.url);
    const claimedBackup = await waitForClaim(ticket.id, 90_000);
    if (!claimedBackup) {
      await telegram.sendMessage(msg.chat_id, SOFT_FALLBACK_NL);
    }
  }
}

"Claim" is a Slack reaction. An adviser reacts to the ticket with an emoji to take it. Lightweight, no new tool to learn. The firm already lived in Slack.

Voice notes, edits, and screenshots

Three things in Telegram broke our first version. Worth naming each.

Edited messages

Telegram clients edit messages. Often. A client sends "did you get my file" then four minutes later edits it to "did you get my file and is the bridge from L1 to Base a taxable event in NL". The bot had already classified the first version as LOGISTICS and answered. The classifier never saw the edit.

Fix: subscribe to the edited_message update type in the Telegram Bot API, re-classify on edit, and if the new bucket is ADVICE_NEEDED, fire a handoff immediately with a "the client edited the question" note in the Slack ticket.

Voice notes

A third of the firm's clients send voice notes. Some are 90 seconds long. We added Whisper transcription, but we deliberately do not let the classifier reason over voice content. Voice notes are auto-routed to a human and the transcript is attached to the Slack ticket for the adviser to read. The bot's reply to a voice note is always the same: "Bedankt voor je spraakbericht. Een adviseur luistert het terug en reageert."

The reason is small but real. Voice notes carry tone and nuance that we do not want the classifier to misread. They are also the channel where clients ask the most personal, regulated questions. Default to human.

Screenshots from exchanges

Clients forward screenshots from Kraken, Bitvavo, Coinbase. These are useful artefacts for the adviser. They are also a trap for the bot: a screenshot of a trade history looks like a logistics message ("here is my data") but is almost always attached to a tax question. We OCR every image, attach the text to the Slack ticket, and route screenshots to a human by default.

The audit log nobody asked for, until they did

Every classification, every handoff, every bot reply is logged with the model output, the prompt hash, and the timestamp. We did not build this because the firm asked. We built it because the firm's tax practice operates under Belastingdienst guidance for crypto holdings, and "no regulated advice was given by a non-human" is the kind of claim that has to be provable, not just true.

In month three, an auditor asked for exactly that proof. We exported a CSV. The conversation lasted twenty minutes.

The numbers we can share

The firm is small enough that we will not give percentages out of ten messages. Real numbers from month one in production, with their permission:

Median time to first human reply on a regulated question: down from 6h 40m to 1m 12s.
Adviser hours per week spent on Telegram triage: down from roughly 22 to roughly 6.
Auto-handled (non-handoff) message share: 64%.
Misroutes where the bot answered something it should have escalated: 3 out of ~2,100 messages. All three were caught in the daily review log within 24 hours and followed up by an adviser. None reached the auditor.

The misroute rate is the number we watch. The other numbers move. That one has to stay near zero.

One thing we wouldn't repeat

We built a follow-up agent in week one that nudged clients after 24 hours of silence with "Are you still there?" The advisers hated it. It made the firm feel pushy and impersonal, and it generated more inbound work than it resolved. We killed it in week two. The lesson is older than agents: do not automate the part of the relationship that is the relationship.

How to look at your own inbox tomorrow

If you run a small services firm and you suspect you have an agent-shaped problem, do this before you talk to anyone about building one. Open your support channel. Read the last 200 messages. Tag each one as LOGISTICS, DOCUMENT, ADVICE, or AMBIGUOUS. If the first two are more than half, the bot is worth building. If they are not, you have a hiring or training problem, not an automation problem.

When we built this Telegram agent for the crypto-tax firm, the thing that surprised us was how much of the work was in the handoff, not the classifier. We solved it by making the default decision "get a human" and treating the bot as a fast triage nurse, not a doctor. If you are looking at AI agents for a regulated practice, that inversion is the move worth copying.

Key takeaway

At a regulated firm, the bot's job is not to answer everything. It is to triage fast and get a human into the chat before the client gets annoyed.

FAQ

Why Telegram and not WhatsApp or email?

The firm's clients already used Telegram. We never tell a client base to switch channels for our convenience. The same triage pattern works on WhatsApp Business and on shared inboxes.

What model do you use for the classifier?

A small, fast model with a structured-output schema. The classifier does not need to be the smartest model in the stack. It needs to be fast, cheap, and biased toward escalation.

How do you prove the bot never gave regulated advice?

Every classification, prompt hash, and reply is logged with a timestamp and exported on demand. The firm runs a weekly review on any message classified as LOGISTICS by the bot.

How long did it take to build?

Four weeks from kickoff to production. Week one was reading the inbox. Week two was the classifier. Week three was the Slack handoff. Week four was edits, voice notes, and screenshots.

What happens when the bot misclassifies?

The daily review log surfaces it. An adviser follows up with the client, usually within hours. The misclassified prompt becomes a test case in the regression set the next morning.

chat agentsai agentscase studyautomationintegrationsworkflow

Building something?

Start a project