Chat agents

Slack onboarding agent: a Rotterdam HR-tech case study

The onboarding team had grown to six. Tickets piled up every Monday. Then we wired a Slack chat agent into Notion and rebuilt the flow from scratch.

Jacob Molkenboer· Founder · A Brand New Company· 5 Jun 2026· 9 min

Antique wooden hotel call-bell board with six numbered tabs, one flipped green, on an ivory desk beside a folded card.

It was a Monday in February. The head of operations at a Rotterdam HR-tech SaaS opened her Slack and watched the #new-clients channel fill up before she had finished her coffee. Forty-seven kickoff requests over the weekend. Six teammates already pinging her for the priority order. She closed the laptop, went outside, and called us.

This is the case study of what we built next, and what broke first.

The flow she was trying to keep alive

The product is a payroll-and-HR platform for European SMBs. Self-serve for the small accounts, white-glove for the bigger ones, and a fat middle band of 5-to-50 employee companies that needed handholding but did not justify a dedicated CSM. That middle band drove the six-person onboarding team.

Their flow had eight steps. Intake from the sales hand-off doc. Kickoff call. CSV template sent to the customer's admin. Validation of the returned CSV (a moving target, because customers are creative). Import. SSO setup. Manager training session. 30-day check-in. Each step lived in a different tool. Sales used HubSpot, ops lived in Notion, the support team used Intercom, and the engineers had built a small admin CLI that only two people on the team could run.

The math was brutal. Eighty to one hundred and twenty new client companies per month. Six full-time onboarders averaging fifteen kickoffs each, and a queue that always seemed to grow faster than it shrank. The bottleneck was not skill. It was that the same five questions arrived in five different inboxes every single day.

Why we did not build a portal

The first instinct, from the engineering side, was a self-serve onboarding portal. A wizard. A progress bar. The ops lead pushed back hard. "Our customers already have six tabs open. They do not want a seventh."

She was right. The customers were operations leads at restaurant chains, dental clinics, small construction companies. Their day was already a stack of half-finished tabs. What they did open all day was their email and their phone. And increasingly, Slack. About half of the customer base was already on Slack internally.

So the brief became: meet the customer where they already are, and remove the steps where a human was just typing the same answer for the hundredth time.

The architecture

We shipped two things. A Slack chat agent that lives in a Slack Connect channel with each customer, and a Notion database that became the single source of truth for every record about that onboarding.

The pieces:

Slack Bolt (TypeScript) as the surface.
The Anthropic API behind it, with a tool-use loop wired to seven internal tools.
Notion as the only durable state. The agent writes there, ops reads from there, the rest of the company subscribes via Notion's own automations.
Postgres for short-lived state (in-flight CSV validations, rate limits, audit log) on Fly.io.
An Intercom tool the agent calls when a human needs to step in.

The key design choice was to give the LLM tools, not freedom. The agent could not write prose answers about pricing or compliance. It could call get_onboarding_step, validate_csv, schedule_kickoff, create_notion_record, fetch_doc (the RAG call), escalate_to_human, and summarize_thread. Anything outside those seven verbs was out of scope.

Here is the relevant slice of the tool definition, lightly redacted:

const tools = [
  {
    name: "validate_csv",
    description:
      "Validate an uploaded employee CSV against the platform schema. " +
      "Returns row-level errors with line numbers and a fix suggestion per error.",
    input_schema: {
      type: "object",
      properties: {
        file_url: { type: "string", description: "Slack file URL." },
        client_id: { type: "string" }
      },
      required: ["file_url", "client_id"]
    }
  },
  {
    name: "escalate_to_human",
    description:
      "Hand the conversation to the on-call onboarder. Use when (a) the " +
      "customer asks something not covered by docs, (b) sentiment turns " +
      "negative, or (c) the same question has been asked twice without " +
      "progress.",
    input_schema: {
      type: "object",
      properties: {
        reason: { type: "string" },
        urgency: { type: "string", enum: ["low", "medium", "high"] }
      },
      required: ["reason", "urgency"]
    }
  }
]

The escalate_to_human tool was the most important one. Not the smartest call in the stack. The most important. Because the question every operations lead at the client kept asking us was, "How do I trust this thing not to lie to my customer?" The answer was: tell it, in the system prompt and in the tool description, to escalate early and often.

Notion as the only place truth lives

Half the work on a project like this is not the agent. It is deciding where state lives. We chose Notion for three reasons. Ops already knew it. The company already paid for it. And Notion's REST API is plain enough that we did not need an ORM.

Each new customer became a row in a single Notion database called Onboarding. Status field, owner field, kickoff date, CSV-validated boolean, SSO-configured boolean, training-done boolean, 30-day-check-in date. The agent wrote to that row. The ops team read from that row in the morning standup. Sales watched the same view filtered to "kickoff this week". Engineering watched the view filtered to "blocked on us".

We banned a second source of truth. No spreadsheets. No Linear tickets duplicating Notion rows. No "I will just track this one in my head". When something lived in two places, it lived in zero.

Warning

If your chat agent and your ops team disagree about what state a customer is in, your agent loses. Every time. The fix is not better prompts. The fix is one database both sides write to.

The agent loop

The Slack agent runs a fairly standard tool-use loop. New message arrives. We build a context window from the last twenty messages in the thread, the customer's Notion row, and the relevant chunks from the docs RAG. The model picks a tool. We run it. We feed the result back. Repeat until the model emits text, or until five tool calls have passed, at which point we force a wrap-up.

Two non-obvious choices made it work.

First, every model response that contained customer-facing prose went through a second, cheaper model pass that we used as a sanity check. The check answered three questions. Does this contradict the docs we shipped in the context. Does this make a promise the agent does not have permission to make. Is this in the customer's language. If any check failed, we routed to a human and did not send the draft. Cost about a tenth of a cent per message. Caught roughly one bad response per hundred.

Second, the agent never made a decision the ops team had not approved as a class. New question type. Escalate. Customer asking for a refund. Escalate. CSV validation passed but the customer said "wait, I think I sent the wrong file". Escalate. The agent's job was to handle the ninety percent of messages that were the same five questions, not to be clever about the long tail.

What broke first

Two things broke in the first month, neither one was the model.

The first was Slack Connect itself. About one in eight customer Slack workspaces had Connect disabled at the org level, and getting their IT team to flip it on added a four-day delay we had not budgeted for. We solved it by shipping an email fallback channel using the same agent backend, so a customer who could not enable Connect still got the chat experience through threaded email replies. About fifteen percent of customers ended up on the email path.

The second was the Notion API rate limit. Notion caps you at roughly three requests per second per integration. With the agent writing to the database on every state change, plus ops reading dashboards, we hit the cap during morning standups and the agent started silently dropping writes. Fix was a small write-coalescing queue in Postgres that batched Notion writes into one update per row per ten seconds. Boring infrastructure work, but it was the difference between an agent the team trusted and one they did not.

If you are building anything that talks to the Notion API at volume, read the rate-limit page before you write a line of code. They are honest about it. Most projects find out the hard way.

The numbers after three months

We do not love vanity metrics, but here are the three the ops lead checks every Monday.

The median time from "kickoff booked" to "first payroll run successfully" was fourteen days when we started. Three months in, it sits at four days. The improvement came mostly from removing dead time between human steps, not from the agent being clever. The agent does not wait for someone to come back from lunch.

The onboarding team went from six full-time onboarders to two. Not through layoffs. The other four moved to a new function the company had wanted to staff for a year: a higher-touch implementation team for the customers above 50 employees. Same headcount, more revenue per head, less Monday queue.

And the agent handles, depending on the week, between seventy-five and eighty-five percent of inbound messages on the onboarding channel without a human ever responding. The remaining slice is exactly the work the human team wants to be doing.

What we would do differently

Three things, plainly.

We would build the email fallback in week one, not week three. The customers who could not enable Slack Connect were our slowest accounts, and they were the ones who needed the agent most.

We would log every tool call as a structured event from day one. We added that in month two and it transformed the kind of question we could ask the data ("how often does the validate_csv tool fail on field X" instead of "did the agent work this week").

And we would push back harder, earlier, on the temptation to add a ninth tool. Every tool you add is a new decision boundary the model can blur. The seven we shipped were enough. The eighth one we proposed and then cut was a "send a payment reminder" tool that had no business being in an onboarding agent.

The smallest thing you could do today

If you run an ops team that lives across Slack, email, and a knowledge base, the five-minute audit is this. Open last week's customer thread inbox. For each thread, write one of three letters in the margin. R for "an answer that was in our docs". S for "an answer that needed a human and could not have been automated". B for "boring repetitive work the human did because no one wired the tool".

Count the R and B letters. That is the slice an agent like the one we built can take off your team's desk.

When we built this chat agent, the thing we kept running into was that the agent's accuracy mattered less than its escalation rules. We solved it by writing the escalation policy in the same Notion database the ops team already trusted, which is how we approach AI agents at ABN: start from escalation, work backwards to the prompt.

Key takeaway

Your chat agent's accuracy matters less than its escalation rules. Write the escalation policy where the ops team can edit it without a deploy.

FAQ

Why Slack instead of a custom onboarding portal?

The customers already had Slack open. A seventh tab would have lowered completion. Meeting customers where they already work beats building a new surface they have to learn.

Why Notion as the source of truth and not a database engineering owned?

Ops already trusted Notion and read it every morning. Picking the tool the humans use beats picking the technically cleanest tool. The agent writes to the same row ops reads from.

How do you stop an LLM-driven agent from making promises it should not make?

Constrain it with tools, not prose. Seven tightly scoped tools, a second cheaper model pass as a sanity check, and an escalation policy the ops team can edit without a redeploy.

What is the most common mistake when wiring an agent into Notion at volume?

Ignoring the rate limit. Roughly three requests per second per integration is easy to hit. Use a write-coalescing queue in Postgres and batch updates to one per row per ten seconds.

Did the six onboarders lose their jobs?

No. Two stayed on onboarding. The other four moved to a higher-touch implementation team for customers above 50 employees, a function the company had wanted to staff for a year.

chat agentsai agentsprocess automationcase studyintegrationsworkflow

Building something?

Start a project