← Blog

Email automation

Bullhorn email agent: wiring Gmail replies into an ATS

An Amsterdam recruiting firm runs 84 candidate replies a day through one inbox. Here is the exact wiring we used to thread, tag, and pre-draft them inside Bullhorn.

Jacob Molkenboer· Founder · A Brand New Company· 8 Jun 2026· 9 min
Cream envelope with dark green wax seal on forest leather blotter, green ribbon, brass clip, folded carbon slip.

It is 19:14 on a Tuesday in Amsterdam Zuid. A recruiter at a thirty-three-person firm has 84 unread candidate replies, Bullhorn open on the left monitor, Gmail on the right, and a spreadsheet where she pastes candidate names so she can find them in the ATS again. The reply that matters most is from a Java engineer she shortlisted yesterday. It is buried under three out-of-office bounces, a salary question, and a thread where the candidate looped in his wife by mistake. She will work until 21:00. She did the same on Monday.

This is the post about how we wired a Gmail-resident email agent into Bullhorn so that the Tuesday-night queue stopped existing. No slide-deck overview. The actual moving parts, the actual gotchas, and the code where the code is the interesting bit.

What was actually slow

The instinct, when a recruiting firm says email is killing them, is to assume volume. Volume was fine. 84 replies a day across the team is a normal number. What was killing them was four invisible costs.

First, every reply needed a context switch back into Bullhorn to find the candidate record. Second, threading was broken in roughly 20% of cases because candidates replied from a different address than the one in Bullhorn. Third, recruiters wrote the same five replies all day (asking for CV updates, confirming availability, declining politely). Fourth, the firm had no idea which threads were warm because nothing was tagged.

None of those are AI problems. Three of them are plumbing. One of them, the canned replies, is where a small model earns its keep. We started with the plumbing.

Push notifications from Gmail, not polling

The first thing every blog post about Gmail automation gets wrong is polling. You do not poll Gmail. You set up push notifications through Cloud Pub/Sub and let Gmail push history events at you. Polling burns quota, lags by minutes, and racy users will out-click your loop.

// Register the watch once per user. Pub/Sub does the rest.
await gmail.users.watch({
  userId: 'recruiter@firm.nl',
  requestBody: {
    topicName: 'projects/abn-ats-bridge/topics/gmail-inbox',
    labelIds: ['INBOX'],
    labelFilterAction: 'include',
  },
})

The watch returns a historyId. Store it per recruiter. When Pub/Sub fires, you fetch users.history.list from that point forward, get the message IDs that were added, and process them. The watch expires after seven days. Renew it every four. Set an alert if the next renewal is more than five days out, because Pub/Sub renewal failures are silent.

Warning

The Gmail watch response gives you an expiration in milliseconds since epoch, not seconds. Treat 1717891200000 as a date, not as a timestamp from 1970. We lost a Sunday on this.

Matching the candidate inside Bullhorn

The Bullhorn REST API is fine once you accept that authentication is a three-step OAuth ritual and the access token lives for ten minutes. Wrap it in a refresh-on-401 client and stop thinking about it.

Matching a reply to a candidate has two paths. Email is the obvious one. Bullhorn candidates carry up to three email fields, so a sender match has to check all three.

async function findCandidate(fromEmail) {
  const q = `email:${fromEmail} OR email2:${fromEmail} OR email3:${fromEmail}`
  const res = await bh.get('/search/Candidate', {
    params: {
      query: q,
      fields: 'id,firstName,lastName,status,owner',
      count: 5,
    },
  })
  return res.data?.[0] ?? null
}

For the 20% where the candidate replies from a personal Gmail (the alias they did not give you), the fallback is the original outgoing thread. Every recruiter email out of Bullhorn or Gmail carries a Message-ID. If the inbound has an In-Reply-To pointing at one we sent, the candidate is whoever we addressed last on that thread. That covers another 18%. The residual 2% gets flagged for human triage with a Possible candidate suggestion based on signature parsing.

Threading, the part everyone gets wrong

Gmail does not thread by subject line. It threads by the RFC 5322 reference chain: Message-ID, In-Reply-To, and References. If you send a draft and forget the References header, Gmail will visually orphan the reply even though the subject is identical. Bullhorn does its own version of this with internal Note IDs.

// What a properly threaded reply needs in its raw RFC822
Message-ID: <7f1a0b3c@firm.nl>
In-Reply-To: <9c2e4a11@gmail.com>
References: <a18b27@firm.nl> <9c2e4a11@gmail.com>
Subject: Re: Java backend role at [client]

The agent maintains a small lookup table keyed by Message-ID that points at the Bullhorn Note ID. When a reply arrives, we append the candidate's text to that note instead of creating a new one. The recruiter opens the candidate record and sees the full conversation in chronological order without leaving Bullhorn.

A small classifier earns its keep

Now the model. Not a model. A small classifier. Reply intent at a recruiting firm is a closed set of six categories: interested, not interested, scheduling, out of office, salary question, recruiter handoff. We send the email body to a small fast model with a single classification instruction and a strict JSON output. No reasoning mode, no chain of thought, no creative writing. The classifier costs less than a tenth of a cent per reply and runs in about 600 milliseconds.

{
  "intent": "scheduling",
  "confidence": 0.92,
  "extracted": {
    "proposed_times": ["2026-06-10T14:00+02:00", "2026-06-11T10:00+02:00"],
    "preferred_channel": "phone"
  }
}

The intent becomes a Bullhorn tag on the candidate record. Recruiters can now filter for every interested Java candidate who replied this week in one click. That filter did not exist before the agent.

An interesting side note: whether agents.md files actually help coding agents is being argued on Hacker News this week. We are not sure either, but the repo for this project has one anyway. It is short, it documents where the candidate-matching logic lives, and when we drop in three months from now to fix something, neither the human nor the next agent has to re-derive the structure.

The two-click draft

For the categories where the recruiter response is predictable (about 70% of replies), the agent generates a draft and saves it to Gmail. The recruiter opens the thread (click one), reads the draft, edits if needed, and hits Send (click two). No tab switching, no copy-paste, no Bullhorn round-trip. The note in Bullhorn gets updated when the draft is sent, via a Gmail push event on the SENT label.

// Save the draft on the recruiter's behalf. Gmail shows it inline.
const raw = buildRfc822({
  to: candidateEmail,
  from: recruiterEmail,
  subject: `Re: ${originalSubject}`,
  inReplyTo: originalMessageId,
  references: [...originalReferences, originalMessageId],
  body: draftedBody,
})

await gmail.users.drafts.create({
  userId: recruiterEmail,
  requestBody: {
    message: {
      threadId: gmailThreadId,
      raw: Buffer.from(raw).toString('base64url'),
    },
  },
})

The drafted body is generated from a template per intent, then personalised. We tried fully open-ended generation in week one. The recruiters hated it. The drafts were too long, too formal, and full of I hope this email finds you well. Switching to slot-filled templates with one personalised sentence per draft cut edit-time per reply from 40 seconds to 6 seconds in our timing logs.

Takeaway

The win in recruiter email automation is not better writing. It is better routing, better threading, and a draft the recruiter has to edit, not approve.

What we paid for in production

Five gotchas that cost us time, in case you are wiring something similar.

Bullhorn token refresh races. The 10-minute token plus aggressive parallelism means two requests can both notice a 401 and both try to refresh, invalidating each other. Wrap the refresh in a mutex per recruiter.

Out-of-office floods. A single weekend of out-of-office bounces from a 600-candidate outreach will mint 600 useless notes if you let it. The classifier catches them, but the cheap pre-filter is Auto-Submitted: auto-replied in the RFC headers. Drop those before they hit the model.

Reply-All and BCC drift. Candidates loop in colleagues, partners, sometimes their next interview. The agent has to decide whose addresses belong on the thread next time. We default to the original to/cc set and require the recruiter to opt any new address in. That kept one accidental disclosure from happening on day 30.

GDPR-compliant logging. The classifier sees candidate emails. That is personal data. We log the intent and confidence, not the raw body, and we let candidates request deletion through a route the recruiter never sees. Dutch firms get audited on this; do not skip it.

Pub/Sub renewal. Already mentioned, worth repeating: silent failure, weekly cadence, set an external monitor.

What the recruiter sees on a Tuesday now

At 17:00 on a normal day, the inbox shows 30 to 60 candidate replies. About 42 of them are already tagged, threaded into the right Bullhorn record, and pre-drafted. The recruiter opens each thread, reads, and either sends or edits. The eight or so that the classifier flags as low-confidence get human triage. By 18:00 she is done. The spreadsheet of pasted names is in the recycle bin.

When we built this for the Amsterdam recruiting firm, the thing that broke first was Pub/Sub watch renewal, exactly as warned above. The fix took an hour. The second thing that broke was the recruiters' trust in the drafts. We had to throw out the open-ended generation and rebuild on templates before the team would actually click Send. Both are the kind of fix you only find in production, which is why we now build every AI agent with a one-week live-shadow phase before any draft goes out the door.

If you want to find out whether this would help your own inbox: grep your sent folder for the five replies you write most often this week. That is the surface area your agent has to cover before anything else.

Key takeaway

The win in recruiter email automation is not better writing. It is better routing, better threading, and a draft the recruiter edits, not approves.

FAQ

Why not use Bullhorn's built-in email integration?

Bullhorn email is fine for sending and basic logging. It does not classify intent, it does not draft replies, and it does not thread across alias addresses. The Gmail agent handles all three on top of Bullhorn, not instead of it.

Does the agent send emails without recruiter approval?

No. Every outbound reply is a draft that the recruiter opens, reviews, and sends manually. The agent never speaks for the firm. That single rule is what made recruiters trust it.

How long did the wiring take?

Three weeks of build, one week of live-shadow alongside the recruiters, one week of template revision based on what they actually edited. Five weeks from kickoff to fully replacing the manual queue.

What happens when Gmail's watch expires?

A cron renews it every four days. If renewal fails three times in a row, the team gets a Slack ping and the agent falls back to a 60-second history poll until renewal succeeds. No replies get lost.

email automationai agentsintegrationsworkflowcase studyoperations

Building something?

Start a project