Automation
Hybrid CRM audit: the checklist before a sales agent
Three numbers decide whether your hybrid CRM can host a sales agent: duplicate drift on the top 25 pipelines, label coverage on the top 10 stages, and one 36-hour test.

It's 09:14 on a Tuesday in Hilversum. We're three days away from quoting a sales-agent retrofit for a B2B SaaS firm doing roughly €11M ARR. On the left monitor, a deal sits at Stage 4 in Salesforce: Acme B.V., €48k, expected close Q3, owned by an account exec who started in May. On the right monitor, the same logo — spelled "Acme BV", no period — sits in HubSpot, marked closed-won three weeks ago, owned by a sales rep who left in February. Both records have the consent checkbox ticked. Neither timestamp matches the other.
This is the moment the audit pays for itself. If we quote the retrofit now, the agent we ship will read both records as gospel and start emailing the wrong contact on the wrong side, citing a consent timestamp that may or may not survive an access request. So before we quote, we run a four-hour audit on the seller's data. Same checklist every time. We don't quote without it.
Why the audit comes before the quote
A sales agent inherits whatever the CRM tells it. On a clean single-system setup that's a manageable problem — bad data hurts the agent, the agent surfaces the bad data, you fix it in a week. On a HubSpot/Salesforce hybrid that's been running unsupervised for two years, the bad data isn't just bad. It's asymmetric. One side believes one thing, the other side believes another, and the agent picks whichever record it queried first.
Anyone shipping agents in production this year has landed on the same conclusion: reliability isn't a model problem, it's a state problem. Agents are reliable to the extent that the world they read is consistent. Hybrid CRMs are where consistency goes to die quietly, at 04:15, on a Tuesday, with no log retention.
So the audit checks three things in this order: duplicate-contact drift across the top 25 pipelines, association-label coverage on the top 10 deal-stages, and whether three specific workflows can survive a 36-hour sync pause without losing the AVG-required consent trail. If any of the three fail, we don't quote the retrofit. We quote a reconcile first, separately, so the seller can decide whether to do that work in-house or with us.
Duplicate-contact drift on the top 25 pipelines
"Drift" is the share of contacts in a pipeline that exist on both sides but disagree about something the agent cares about: email, primary phone, account owner, lifecycle stage, or consent timestamp. We compute it pipeline-by-pipeline because the top 25 pipelines do roughly 90% of the agent's work and a single global drift number hides the bad ones. A healthy 8% global average is often a 4% on the small pipelines and a 31% on the one pipeline that actually books revenue.
The script is boring on purpose. Pull contacts via both APIs, normalise emails (lowercase, strip plus-aliases, strip dots in Gmail addresses), and join.
// audit/drift.mjs
import { fetchHubspotContacts, fetchSalesforceContacts } from './crm.mjs'
import { levenshtein } from './util.mjs'
const norm = (e) => e.toLowerCase().replace(/\+[^@]+/, '').trim()
for (const pipeline of TOP_25_PIPELINES) {
const hs = await fetchHubspotContacts({ pipeline })
const sf = await fetchSalesforceContacts({ pipeline })
const hsByEmail = new Map(hs.map((c) => [norm(c.email), c]))
let exact = 0, fuzzyHits = 0, orphans = 0
for (const c of sf) {
const match = hsByEmail.get(norm(c.email))
if (match && match.owner === c.owner) exact++
else if (match) fuzzyHits++
else orphans++
}
const drift = ((sf.length - exact) / sf.length) * 100
console.log(`${pipeline}: drift=${drift.toFixed(1)}%, orphans=${orphans}`)
}
The thresholds we use, learned from thirty-odd of these audits across Dutch SMEs:
- Under 12% drift on a top-25 pipeline: the agent can be retrofitted, with a light cleanup pass on the worst offenders during week one.
- 12–25%: stop-light. We quote a two-week reconcile before the retrofit, and we charge for it separately so the seller can decide whether to do it themselves.
- Above 25%: the agent will hallucinate ownership and consent. We refuse to retrofit until the seller picks a system of record for that pipeline. That conversation usually takes longer than the technical work.
One thing the script above doesn't catch is owner drift. A contact's owner can match on string and still be wrong — if the AE who owns it left in January and HR never deprovisioned the mailbox, the agent will queue messages "from" an address nobody reads. Before retrofit week one we cross-check the owner column against the seller's HRIS exit list. It's a join nobody enjoys writing. It catches roughly 6% of contacts on every audit we've run, and on one particularly bad seller it caught 19%.
Association-label coverage on the top 10 deal-stages
HubSpot's association labels (Decision Maker, Champion, Influencer, Billing Contact, Procurement) are the closest thing a CRM gives you to "who matters on this deal". Salesforce calls them Contact Roles on the Opportunity. A sales agent uses these labels to decide who to open with, who to CC, and who never to email without the AE saying so. Without them, the agent picks the most-recently-touched contact and hopes.
The audit counts, per deal-stage in the top 10, the share of deal-contact associations that carry a label on either side:
-- audit/labels.sql
SELECT
ds.name AS stage,
COUNT(*) FILTER (WHERE a.label IS NOT NULL) * 1.0
/ COUNT(*) AS coverage
FROM deal_stages ds
JOIN deal_contact_associations a USING (deal_id)
WHERE ds.id IN (SELECT id FROM TOP_10_STAGES)
GROUP BY ds.name
ORDER BY coverage ASC;
Below 60% coverage on a late-funnel stage — proposal, contract-out, verbal — and the agent will email the wrong person on the wrong day. We've watched an agent on uncleaned data send a renewal prompt to the contract signer's predecessor, who left eighteen months ago. The address still resolved through a forwarding rule, the email landed in nobody's inbox, the deal went quiet for a quarter, and the post-mortem blamed the agent. The agent did exactly what its data told it to.
A 90% global label coverage hides a 40% coverage on your contract-out stage. Score per stage. Never as one number.
The 36-hour sync pause at 04:15
March 2025, a client in Utrecht. Their integration tool — a popular Make.com replacement — silently paused its HubSpot→Salesforce queue at 04:15 on a Tuesday after an auth token rotated and nobody updated the connector. The queue backed up for 36 hours before someone in standup noticed a deal hadn't moved. By the time we were called, 312 consent state-changes had happened on the HubSpot side without propagating: new subscribes, two unsubscribes, six explicit double opt-ins recorded via a webform tied to a download.
The AVG question landed two weeks later, when a contact filed an Article 15 access request: show me when I gave consent, on which channel, and what the form text said at that moment. The middleware logs were already gone — seven-day retention, default. HubSpot had a timestamp. Salesforce had a different one, written when the queue drained on Thursday afternoon. The form text on the webform had been edited in between by a marketing intern. Nobody could prove which version the contact actually agreed to.
The fine the regulator can theoretically issue under the AVG runs to €20M or 4% of annual turnover, whichever is higher. In practice the Dutch Autoriteit Persoonsgegevens rarely reaches anywhere near that ceiling — but a single complaint pulls six weeks of operator time into evidence collection. The middleware logs they ask for are, in our experience, gone by the time they ask. Build like they're not there.
This is the test we apply to every workflow in the audit: if the sync pauses for 36 hours starting at 04:15, can you still reconstruct the consent trail for every event in that window, with timestamps and the exact text the contact saw? Most workflows fail it. Three patterns survive.
Three workflows that survive the sync pause
Consent capture writes to a ledger first
Consent never goes directly into HubSpot or Salesforce. It goes into a third place — a tiny append-only Postgres table — with a server-side timestamp, the rendered form text frozen at the moment of capture (not a reference to a template that might change next week), the IP, and the user agent. HubSpot and Salesforce both become downstream readers. If the sync pauses, the ledger is untouched and the AVG trail is intact regardless of which side is up.
-- consent ledger schema
CREATE TABLE consent_events (
id BIGSERIAL PRIMARY KEY,
contact_email TEXT NOT NULL,
event_type TEXT NOT NULL, -- 'opt_in', 'opt_out', 'doi_confirmed'
channel TEXT NOT NULL, -- 'webform', 'api', 'manual'
form_text TEXT NOT NULL, -- frozen at capture
ip INET,
user_agent TEXT,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
source_system TEXT NOT NULL -- 'webform-v3', 'hubspot-form', ...
);
-- no UPDATE, no DELETE, ever. revoke them at the role level.
Stage transitions trigger durable jobs, not webhooks
A deal moving to "Contract Out" should fire a contract-send. If you wire this with a Zapier webhook or an in-CRM workflow that POSTs to your contract tool, a sync pause means the trigger fires twice, or never, or against the wrong stage when the queue drains. We queue the trigger into a durable job runner (Inngest or Temporal both work) with an idempotency key of deal_id:to_stage_id:transition_timestamp. If the sync resumes and the trigger arrives a second time, the job runner sees the key and no-ops. The agent's contract gets sent once, with a timestamp the system can defend.
The other thing the durable runner buys you is a replay log. Six weeks after the agent goes live, someone in legal will ask why a particular contract went out at 16:42 on a Wednesday. With a real job runner you open the deal_id in the dashboard and see the trigger payload, the stage at the time of the trigger, and the exact prompt the agent assembled around it. Without that, the answer is a shrug, and a shrug is what turns a question into an audit.
Unsubscribe checks read from the ledger on every send
The tempting design is to sync unsubscribes between HubSpot and Salesforce on a five-minute cadence. The brittle design, too. During the 36-hour pause, an unsubscribed contact will still get the next campaign from whichever side didn't yet have the update — and that's an AP complaint waiting to happen. Instead: every send-time check — the agent's, HubSpot's marketing tool's, Salesforce's outbound — hits the consent ledger directly on the way out. Slower by maybe 40ms. Cannot be wrong.
The consent trail belongs in a system that doesn't sync. The moment you make it the CRMs' job to agree on consent state, you've made AVG compliance contingent on uptime.
What the audit hands back
The audit ends with a one-page PDF. Three numbers up top: average drift on top-25 pipelines, average label coverage on top-10 stages, and a pass/fail per surviving workflow. Below that, a list of pipelines and stages ranked by reconcile effort, with the worst three offenders called out by name. We mark the agent retrofit as ready, ready-after-reconcile, or not-yet.
The list is dull on purpose. Each row names a pipeline or stage, its current drift or coverage number, the dollar-weighted value of the deals sitting in that bucket right now, and an estimate of the hours to reconcile. We sort by deal-value-at-risk, not by drift percentage — a 35% drift on a pipeline that runs six deals a year matters less than a 14% drift on the pipeline that runs four hundred. The RevOps owner reads it, picks a top-three, and we either quote that work or hand the list back for the in-house team to attack on their own clock.
Not-yet is roughly one in five sellers. They don't like hearing it. They like it less when, six months later, somebody else's agent has emailed their procurement contact's former assistant about a renewal and the response is a polite-but-firm AVG enquiry from the contact's legal team.
When we built the sales agent for a Hilversum B2B SaaS firm running this exact hybrid, the duplicate-contact drift came back at 31% on the top pipeline and label coverage at 44% on contract-out. We did a one-week reconcile, stood up the consent ledger, and only then started writing prompts. The plumbing under every AI agent we ship is the part nobody asks for until they've launched without it once.
Five-minute thing you could do today: open your Salesforce reports, run a contact-by-email duplicate report against your top pipeline, and divide unique emails by total contacts. If the result is under 0.88, your sales agent will be wrong before it sends the first email.
Key takeaway
The consent trail belongs in a system that doesn't sync — the moment the CRMs have to agree on consent state, AVG compliance is contingent on uptime.
FAQ
What counts as 'drift' between HubSpot and Salesforce?
Drift is the share of contacts that exist on both sides but disagree about email, owner, phone, lifecycle stage, or consent timestamp. We score it per pipeline, not globally, because the worst pipelines hide inside healthy averages.
Why score association-label coverage per deal stage instead of globally?
A 90% global coverage can hide a 40% coverage on contract-out, where the wrong contact costs you the deal. Late-funnel stages decide who the agent opens with, so they need their own number.
How does a consent ledger help with AVG?
It records consent in a third system that doesn't depend on CRM sync. Timestamps, form text and channel are frozen at capture, so an Article 15 access request can be answered even if HubSpot and Salesforce disagree for 36 hours.
Why not just pick one CRM instead of running a hybrid?
Most sub-€16M Dutch SMEs can't justify a migration mid-year, and sales and marketing rarely agree on which side to keep. The audit is what you do when the hybrid is the reality, not the goal.
How long does the audit take?
About four hours of script time on the seller's data plus a half-day to review the output with the RevOps owner. We deliver a one-page PDF with three numbers and a ranked reconcile list.