Process automation

Process automation playbook: 3 ATS systems, one queue

A senior recruiter has three ATS tabs open at 18:47, the same candidate in two of them, and 400 CVs left. Here is the playbook we used to fix it.

Jacob Molkenboer· Founder · A Brand New Company· 15 Jun 2026· 11 min

Wooden switchboard with three brass patch cables routed into one central jack, one cable green, on ivory desk.

It is 18:47 on a Tuesday in Tilburg. A senior recruiter has three browser tabs open: Bullhorn for the pipeline, Otys for the vacancy feeds, and Carerix because that is where her client's placement history has lived for the last twelve years. The same candidate appears in two of those tabs under slightly different spellings of his last name. She has 400 unread CVs in her inbox. She picks one up, sees a duplicate flag, puts it down to merge the duplicate in Carerix, and by the time she comes back she has lost the thread of the first candidate. This is the process automation work we were hired to fix.

The agency runs 22 recruiters and roughly 2,640 candidate submissions per week. Around a third are duplicates of someone already sitting in at least one of the three systems. The reconciliation work alone was eating about ninety minutes per recruiter per day, every day, with no end in sight. The brief was simple to state and not simple to ship: reconcile the three systems without ever auto-rejecting a CV that a recruiter has not personally rated.

What 2,640 weekly submissions actually look like

The first thing to understand is that "duplicate candidate" is not a single problem. It is at least four overlapping problems, and they need different fixes.

Same person, same email, two records. Trivial in theory, ten thousand edge cases in practice (personal vs. work email, agency-forwarded inboxes, gmail dot tricks).
Same person, different email, same phone. Common after a job change.
Same person, no email or phone overlap, but same name, year of birth, and last employer. Very common when a candidate applies through a job board years apart.
Different people, same name. Frequent in the Dutch market, where common surnames stack up fast.

Before the agent existed, the agency had tried two things. First, a Friday duplicate-cleanup hour where every recruiter was supposed to dedup their own desk. It died inside three weeks because nobody finishes that hour with a smaller queue than they started, and nobody has the political capital to flag a colleague's record as wrong. Second, a third-party dedup vendor that ran against Bullhorn alone, ignored the other two systems entirely, and quietly cost €1,400 a month for the privilege. Neither approach survived contact with a multi-system desk.

The agent had to handle all four duplicate shapes without ever silently overwriting the wrong record. The single hard rule, agreed in the first scoping meeting and never relaxed, was that no candidate would ever be rejected, archived, or contacted by the system itself. The agent's job is to propose. A human's job is to decide.

The reconciliation key we settled on

We tried hashing email plus phone first. It missed half the duplicates. We tried a fuzzy name-and-date-of-birth match next. It generated a flood of false positives because of common Dutch surnames. The version that survived production is a composite key, scored, with a threshold.

import re, unicodedata
import phonenumbers

def normalize_name(s: str) -> str:
    s = unicodedata.normalize("NFD", s)
    s = "".join(c for c in s if unicodedata.category(c) != "Mn")
    return re.sub(r"\s+", " ", s.lower().strip())

def canonical_email(s: str) -> str:
    s = (s or "").strip().lower()
    if s.endswith("@gmail.com"):
        local, _, domain = s.partition("@")
        local = local.split("+", 1)[0].replace(".", "")
        s = f"{local}@{domain}"
    return s

def to_e164(raw: str, region: str = "NL") -> str | None:
    try:
        n = phonenumbers.parse(raw, region)
        if not phonenumbers.is_valid_number(n):
            return None
        return phonenumbers.format_number(
            n, phonenumbers.PhoneNumberFormat.E164)
    except Exception:
        return None

def reconciliation_key(c) -> dict:
    return {
        "email": canonical_email(c.email),
        "phone": to_e164(c.phone),
        "name":  normalize_name(c.full_name),
        "yob":   c.dob.year if c.dob else None,
    }

The matching rule is deliberately conservative. Two records score as a hard match if any two of (email, phone, name+yob) are equal. Anything weaker is queued for a human and never auto-merged. We also log every near-miss with its score, so we can tune the threshold later from real production data instead of guesswork.

The scoring weights are not magic. Email exact match: 1.0. Phone exact match after E.164 normalisation: 1.0. Name plus year of birth match: 0.8. Name alone fuzzy match (Levenshtein distance of two or less on the normalised name): 0.4. Sum to a score per pair. Threshold for a hard match is 1.6, threshold for queue review is 1.0, anything lower is suppressed unless the same pair surfaces again within thirty days. The thirty-day window is what catches the candidate who got rejected in March and reapplied in May under a different email address.

The four-eyes queue we wire into every write path

The most important sentence in this entire playbook: the agent never writes directly to any of the three ATSes. It writes to a proposals table. A recruiter reviews. A second recruiter approves. Only then does the dispatcher execute the underlying API call.

type ProposalAction =
  | "merge_candidates"
  | "update_status"
  | "create_note"
  | "tag_duplicate";

interface Proposal {
  id: string;
  system: "bullhorn" | "otys" | "carerix";
  action: ProposalAction;
  payload: Record<string, unknown>;
  diff: { before: unknown; after: unknown };
  confidence: number;     // 0..1, from the matcher
  reasoning: string;      // the agent's own explanation
  proposed_at: string;    // ISO 8601
  reviewed_by?: string;   // recruiter id, first eyes
  approved_by?: string;   // recruiter id, second eyes
  executed_at?: string;
}

Every proposal carries the exact serialized API call that would be made, a structural diff of the before and after state, the matcher's confidence score, and a short natural-language reasoning string the agent writes for the human reviewers. The queue UI shows all of that on a single screen, with the second eye locked out for the recruiter who reviewed first. If the two reviewers disagree, the proposal sits there until the team lead breaks the tie.

Warning

If you build a recruitment automation in the EU and let it reject, score, or even silently down-prioritise a candidate without human review, you are inside the scope of GDPR Article 22 on automated decision-making, and from 2 August 2026 onward also inside the high-risk obligations of the EU AI Act. Either treat the four-eyes queue as non-negotiable, or budget a year of legal work.

Carerix as the eldest sibling

The Tilburg agency's Carerix install is twelve years old. The candidate export endpoint is a SOAP call that returns XML and refuses to paginate. On a quiet day the response is around 7,000 records; against the production load balancer it times out at sixty seconds. We worked around it the same way you work around any old API that does not respect you: cursor by createdAt, persist the checkpoint to Postgres between runs, back off on a SOAP fault that mentions "Service Unavailable".

def walk_carerix_candidates(checkpoint: datetime) -> Iterator[Candidate]:
    cursor = checkpoint
    while True:
        try:
            batch = carerix.candidates(created_after=cursor, limit=200)
        except SoapFault as e:
            if "Service Unavailable" in str(e):
                time.sleep(30); continue
            raise
        if not batch:
            return
        for c in batch:
            yield c
        cursor = batch[-1].created_at
        persist_checkpoint("carerix", cursor)

Bullhorn and Otys are easier neighbours. Bullhorn's REST API behaves like a modern REST API behaves: paginated, JSON, rate-limited but documented. Otys runs a SOAP-and-REST hybrid that stays well-mannered as long as you respect the per-tenant token cap. The interesting work was never in the API calls. It was in the reconciliation key, the queue, and the rollback policy.

The legacy SOAP endpoint had one more surprise we found in week two. The createdAt field returned in the cursor response is the record's first-ever creation timestamp, which on a twelve-year-old install is sometimes 2014. For records that had been imported during a previous Carerix migration in 2017, the cursor would jump three years backwards on every restart, redoing the same work and burning the rate-limit budget for the day. The fix was a second cursor on lastModifiedAt with a sliding window of two days, plus a deduplicating set in Redis to drop records we had already processed in the current run.

Three weeks in shadow mode before any write went live

We shipped the agent into production with every write path disabled. For the first three weeks, the agent proposed; nothing executed. The queue UI was live but flagged as "shadow mode" and the dispatcher refused to fire. We logged every proposal alongside what recruiters actually did during the same period, and we measured the agreement rate.

The first day's agreement rate was 71%. Most of the gap was the name-matcher being too eager on common Dutch surnames. We tightened the threshold, added the year-of-birth requirement when names alone matched, and re-ran. By the end of week three the rate was 94%, and crucially, the 6% of remaining disagreements were almost always cases where the recruiter had context the agent could not have (a phone call, a client conversation, a LinkedIn message). At that point we turned on the dispatcher with a per-day cap of fifty executed proposals and ratcheted it up over a fortnight.

What surprised us in shadow mode was the asymmetry. The agent was right slightly more often than the recruiters on the easy cases, where two records shared an email and a phone number. On the ambiguous cases the recruiter's call beat the agent's score every time, and the gap was not close. That asymmetry is what convinced the team to keep human approval permanent rather than phase it out after a trust period. We do not want an agent that is right on average; we want one that is never wrong unilaterally.

Shadow mode is the cheapest insurance you will ever buy on a write-path automation. Three weeks of zero-impact logging give you the confidence interval that a steering committee will keep asking for and that you cannot fake.

The kill switch and the rollback policy

Every executed proposal is reversible. The dispatcher stores the structural diff of before-and-after state alongside the API call it sent, and it knows how to invert each of the four supported actions: merges can be split, status changes can be reverted, notes can be soft-deleted, duplicate tags can be removed. The rollback is itself a queued proposal that goes through the same four-eyes review, with one extra constraint: the original approver of an action is not allowed to approve its rollback. Different eyes, by design.

Except in one case: the kill switch. If the agent's per-day error rate against any single ATS climbs above three percent, the dispatcher pauses globally, sends a Slack alert to the team lead, and refuses to dispatch any further proposals until a human runs an explicit unpause command from the operations console. The threshold was tuned once, in week six, after a Bullhorn deploy changed a status enum and we silently miswrote thirty-eight records before anyone noticed. Every process automation that touches production data needs an exit; this is ours. The cost of building the kill switch up front was about two engineering days. The cost of not having it the morning we found those thirty-eight records would have been incalculable.

What we deliberately did not automate

The shortest section of the playbook and the most important one. The agent does not send messages to candidates. It does not change candidate status from "in process" to "rejected". It does not edit a vacancy. It does not write a score. It does not e-mail a client. Every one of those was on the original wishlist. Every one of them came off the list once we sat in a recruiter's chair for an afternoon and watched what those actions actually require (judgement, tone, a phone call, a screenshot of a CV with a coffee stain on it).

What is left for the agent is the substrate work: deduplication, cross-system status reconciliation, the suggested merges, proactive flagging of records that look stale or contradictory, and the four-eyes proposals. That is plenty. The 90 minutes per recruiter per day on reconciliation work is now closer to 12, and those 12 minutes are spent reviewing the queue, not chasing duplicates across three browser tabs. Process automation that respects judgement is automation that draws the right boundary, and the right boundary is almost always tighter than the original wishlist.

A practical move you can make this week

If you run a multi-ATS or multi-CRM operation, the smallest thing you can do this afternoon is the audit your team lead has been postponing for a year. Open all three systems side by side, pick fifteen candidates at random from the last calendar month, and trace each one across the three. Count the duplicates. Count how long the trace took. That number is your baseline, and you cannot improve what you have not measured.

When we built the process-automation agent for the Tilburg agency, the surprise was not the SOAP API or the duplicate detection. It was the auditability of the four-eyes queue itself: the Autoriteit Persoonsgegevens wanted to see every proposal, every reviewer, and every approver, kept for the full retention window. We solved it by writing the queue to an append-only Postgres table with row-level signatures and a Grafana view the compliance officer reads on Monday mornings. That is the actual deliverable. The reconciliation is the easy part.

Key takeaway

Never let the agent press a write button a human has not seen first. The four-eyes queue is the entire product, not a feature on the side.

FAQ

Can the agent reject candidates if a recruiter approves a standing rule?

No. Even with a rule, every rejection still runs through the four-eyes queue and is executed under the reviewer's identity, never the agent's. The standing rule only changes default values on the proposal, not who approves it.

How long did the build take end to end?

Twelve weeks from kickoff to first executed write. Three of those weeks were shadow mode, where the agent proposed every action but the dispatcher refused to fire.

What happens when the Carerix SOAP endpoint goes down?

The dispatcher backs off automatically and parks the write in the queue. A nightly job retries with the cursor checkpoint. Bullhorn and Otys writes are unaffected because each system has its own dispatcher.

Does this design fall under GDPR Article 22?

As long as a human reviews and approves every decision, no. The four-eyes queue is the human-in-the-loop guarantee that keeps the agent out of Article 22 scope, and out of the EU AI Act's high-risk obligations.

What if the two reviewers disagree on a proposal?

It sits in the queue until the team lead breaks the tie. Disagreements are under 6% of all proposals and almost always involve context the agent could not have seen, such as a phone call with the candidate.

process automationai agentsintegrationsworkflowoperationscase study

Building something?

Start a project