Chat agents

Chat agent leak: the Slack incident and our staging gate

Thursday, 14:11. A draft press release lands in a public Slack channel. The chat agent was supposed to DM it. Here is what failed and the gate we now run before staging.

Jacob Molkenboer· Founder · A Brand New Company· 17 Jun 2026· 8 min

Cream envelope unsealed on ivory blotter, chartreuse silk ribbon, tipped brass bell, broken clay red wax seal.

Thursday afternoon, 14:11. An account manager at a 24-person marketing agency in Amersfoort opens Slack. In #general, between a parking-spot question and a Loom thumbnail, sits an unsent draft press release for a klant they had been onboarding for six weeks. Embargoed. Quoted exec. Internal pricing line that should never have left a DM. Posted by the chat agent.

The agent had been live in production for eleven days.

The eleven-day-old agent

The agency runs a small but dense operations stack: HubSpot, Notion, Google Workspace, Slack. Their chat agent does what most agencies want a chat agent to do. It drafts client follow-ups, surfaces overdue invoices, and posts internal status pings into the right DM when a campaign asset crosses a milestone. It does not post to public channels. That was the spec.

The system worked. For ten and a half days, it worked very well. The team kept a Notion page with the agent's wins. It saved roughly four hours a week per account manager, which the agency owner counted in real money.

Then Thursday happened.

What actually broke

The agent calls the Slack chat.postMessage endpoint through a thin tool-wrapper we had written around the official Slack Web API. Slack's documentation is explicit: channel is a required parameter. You cannot post a message without one.

Our wrapper, however, was not explicit. Three things lined up:

The JSON schema we exposed to the model marked channel_id as "required": false. A copy-paste from an internal demo where the wrapper hard-coded a sandbox channel.
The wrapper itself had a fallback: channel_id = payload.channel_id ?? config.default_channel. The default, set during that same demo, was #general. Nobody had removed it.
The system prompt told the agent that if no channel was specified, it should post to the team's default channel. Three different sources of permissiveness, each individually defensible.

On the Thursday in question, the agent was asked to draft a press release and send it to the brand lead for review. The model produced a tool call with recipient: "Marije", which the wrapper was meant to translate into a DM. The schema did not require channel_id. The model omitted it. The wrapper applied the default. #general received a press release.

Warning

An LLM treats a required field as required. It treats an optional field with a default as an invitation to omit it. Your schema is your contract; the prompt is a suggestion on top.

Why required is not enough

Reading the post-mortem back, the failure was not the model. The model behaved exactly as the schema invited it to. It is the same class of issue that OWASP catalogues as excessive agency in their LLM Top 10: a tool has more capability than the task requires, and the agent eventually exercises it.

What surprised us, and what is worth flagging for anyone running a customer-facing chat agent in 2026, is how mundane the failure path was. There was no jailbreak. No prompt injection. No clever exfiltration. A required field was marked optional, a default was set to a public channel, and either fix on its own would have prevented the incident. Together they shipped a press release to twenty-four people.

We pulled the message within ninety seconds. The klant was told the same day. The agency's owner handled it well. The damage was reputational, contained, and survivable. We were lucky.

The three-step gate we now run

We do not ship a customer-facing chat agent to staging without passing three checks. The gate is short on purpose. It exists so that the kind of failure we saw on Thursday cannot reach a klant.

1. Schema strictness

Every tool the agent can call gets its schema re-derived from a single source of truth. We pin additionalProperties: false, mark every field the upstream API treats as required as required in our wrapper too, and remove every fallback default for routing fields (channel, recipient, address, account_id). The check is automated. CI fails the build if a wrapper exposes a routing field as optional.

// tools/slack/post-message.ts
import { z } from "zod";
import { slack } from "../../clients/slack";

export const postMessageSchema = z
  .object({
    channel_id: z.string().regex(/^[CDG][A-Z0-9]{8,}$/),
    text: z.string().min(1),
    thread_ts: z.string().optional(),
  })
  .strict();

export async function postMessage(input: unknown) {
  const args = postMessageSchema.parse(input); // throws on missing channel_id
  return slack.chat.postMessage({
    channel: args.channel_id,
    text: args.text,
    thread_ts: args.thread_ts,
  });
}

The Zod schema is what we expose to the model, what the wrapper validates against, and what CI inspects. One artefact, three uses. There is no place for a routing field to drift into optional.

2. Permission matrix

The second check is an explicit allow-list. For every agent we ship, we write a matrix of which tools may be called against which targets. It is boring, declarative, and impossible for the model to override.

# agents/agency-bot/permissions.yaml
agent: agency-bot
slack:
  chat.postMessage:
    channels:
      allow:
        - "^D[A-Z0-9]+$"        # direct messages only
      deny:
        - "^C[A-Z0-9]+$"        # all public channels
        - "^G[A-Z0-9]+$"        # all private channels
hubspot:
  contacts.update:
    fields:
      allow: ["lifecycle_stage", "last_contacted"]
      deny: ["email", "phone"]

The wrapper checks the matrix before every call. If the agent tries to post a message to a public channel, the call is refused at the wrapper, the model is told it was refused, and the attempt is logged. The model does not get to decide whether the rule applies. Anthropic's tool use guide is clear that a model can be steered but not bound by instructions alone; the binding has to live in the surrounding code.

3. Outbound rehearsal

The third check is the one most teams skip. Before an agent goes to staging, we replay a corpus of one to two hundred prompts drawn from real conversation history the agent will face. We log every tool call the model would have made and check the targets against the permission matrix. The corpus contains adversarial prompts, accidental ambiguities, and a small set of prompts that previously caused incidents on other agents.

The rehearsal is cheap. It runs in a sandbox account with no network reach into customer systems. It catches the boring failures: a wrapper that mis-resolves a recipient, a system prompt that mentions a default no longer present, a tool exposed to the model that nobody remembered to remove. Roughly one in twelve rehearsal runs flags a real issue. That ratio has held steady across the fourteen agents we now run in production.

Takeaway

The model is not the perimeter. The wrapper is. If your wrapper has a fallback default for a routing field, your agent has a leak waiting for the right Thursday.

What we changed in the wrapper

For the Amersfoort agent, the fix took an afternoon. We deleted the default_channel field from config. We made channel_id required in the Zod schema. We added the permission matrix and wired the wrapper to enforce it. The rehearsal corpus took a day to assemble from the agent's first eleven days of logs. Total time to a re-shippable state: forty hours, including the klant call and the internal post-mortem.

The agency kept the agent. Their owner was, fairly, more interested in the gate than the apology. We now run that gate on every chat agent we build, regardless of klant size. It is the kind of process that reads as overkill when you read about it and as obvious when you live through the alternative.

What you can do this afternoon

If you run any agent that calls a tool with a routing field (channel, recipient, account, customer_id), grep your wrapper for default, fallback, and ??. Any line where a routing field is assigned a fallback is a candidate for the same incident we saw on Thursday. Remove the fallback. Make the field required in the schema. Let the model fail loudly when it forgets, because a loud failure surfaces in your logs and a silent default surfaces in your klant's Slack.

When we built the chat agent for the Amersfoort agency, the failure that taught us this gate was a single line of permissive config left over from a demo six months earlier. We solved it by removing the line, tightening the schema, and writing the rehearsal that now runs on every AI agent we ship.

Key takeaway

A chat agent's perimeter is its tool wrapper, not its system prompt. If the wrapper has a fallback for a routing field, the leak is already scheduled.

FAQ

How does a chat agent end up posting to a channel it shouldn't reach?

Almost always through a permissive tool wrapper. The system prompt is a guideline; the wrapper is the gate. If the wrapper has a fallback default for the channel field, the agent will eventually hit it.

Isn't a strict system prompt enough to keep the agent in line?

No. Models follow prompts probabilistically. They follow code unconditionally. Any binding rule about which tools can be called against which targets has to live in the wrapper, not the prompt.

How much does the three-step gate add to a typical agent project?

About one to two days of engineering up front and roughly thirty minutes per release. It pays for itself the first time the rehearsal flags a regression before staging.

What does outbound rehearsal look like in practice?

Replay one to two hundred real prompts through the agent in a sandbox, log every tool call it tries, and compare each target to the permission matrix. Any call that would hit a denied target is a bug to fix before staging.

Could a JSON schema with additionalProperties false have prevented this?

It would have helped, but the real fix is marking routing fields as required and deleting fallback defaults in the wrapper. Strictness has to live in both the schema and the surrounding code.

chat agentsai agentsintegrationssecurityarchitectureoperations

Building something?

Start a project