← Blog

Integrations

CRM API quirks: silent merges and dropped emoji in prod

We rolled out a sales-enablement agent across 24 BDRs in Utrecht. Three weeks in, the audit logs showed every leadSource attribution had been overwritten.

Jacob Molkenboer· Founder · A Brand New Company· 2 Jan 2026· 10 min
Leather ledger with two overlapping carbon slips, brass clip, green tag, red stamp, brass bell and wax seals on ivory desk.

The BDR sat down at 09:14 on a Tuesday in Utrecht and pasted a Dutch customer's email signature into a lead note. The signature ended in a tulip — 🌷 — and a line about the customer's daughter starting school that week. Our sales-enablement agent picked up the note, de-duplicated against a Salesforce Lead that had been imported from a LinkedIn export the prior Friday, merged the two records, and pushed the result back to the CRM API. The response was 200 OK. The tulip disappeared. The leadSource, which had said "LinkedIn — Joost referral", was overwritten with "Web - Default". Nobody noticed for three weeks.

We're fourteen agents into production at this point, and this one — a sales-enablement agent for a 24-person B2B SaaS team — taught us more about CRM REST quirks than the previous thirteen combined. The team was split across three CRMs: Salesforce for the enterprise pipeline, HubSpot for the SMB motion, and Pipedrive holding three years of legacy deals nobody wanted to migrate. Below is the cheatsheet that came out the other side, ranked by which quirks silently destroy data versus which ones merely annoy.

Tier 1: silent leadSource overwrites on merge

These are the ones we now wrap in a pre-merge snapshot. If you do nothing else after reading this post, do this.

Salesforce Lead.merge picks the master's leadSource, even if blank

The Salesforce composite Lead endpoints document that "the master record wins" for most fields, but the wording hides a sharp edge: LeadSource on the master wins even when the master's value is blank and the duplicate's value is the real attribution. We confirmed this in a sandbox with seven test rows. The fix is sequence-sensitive:

curl -X PATCH \
  "$SF_INSTANCE/services/data/v60.0/sobjects/Lead/$MASTER_ID" \
  -H "Authorization: Bearer $SF_TOKEN" \
  -H "Content-Type: application/json; charset=utf-8" \
  -d "{\"LeadSource\": \"$DUPLICATE_LEAD_SOURCE\"}"
# THEN run the merge. Order matters.

We patch the master with the duplicate's LeadSource first if the master is blank, then merge. The merge will not undo a PATCH that ran a millisecond earlier.

HubSpot contact merge collapses lifecyclestage history

HubSpot's contacts API exposes POST /crm/v3/objects/contacts/merge, and the docs are clear that the primary contact's properties win. What they don't make obvious is that the property-history endpoint loses the secondary's lifecyclestage timeline entirely. If the secondary went MQL → SQL → Opportunity and the primary stayed at "lead," the merged record looks like a lead that has never moved. For attribution dashboards this is fatal.

Pipedrive deal merge silently drops non-primary custom fields

Pipedrive's deal merge endpoint returns 200 OK with the merged deal body. Custom fields set on the non-primary deal but unset on the primary are not in the response. They are also not in the deal afterwards. There is no warning header, no X-Discarded body field, nothing. We caught it because a finance custom field — the contract reference for invoicing — vanished on three merges in one afternoon.

Salesforce duplicate-rule bypass

The Sforce-Duplicate-Rule-Header: allowSave=true header is the recommended way to push a record through a strict duplicate rule. It also silently skips the dedupe altogether. We caught this in audit logs: 41 leads created in a single afternoon, all with allowSave=true, all duplicates of existing leads. The agent had been instructed to set the header on retry. It set it on every call.

convertedLeadSource does not flow to the Account

When you convert a Salesforce Lead, the LeadSource field on the resulting Contact is populated. The Account gets nothing. If your reporting joins through Account, every converted lead looks sourceless. We solved this by patching the Account's AccountSource field as a post-conversion step in the agent's workflow.

Tier 2: 200 OK with quiet data loss (the emoji tier)

This is where the tulip went.

Salesforce custom text fields and 4-byte UTF-8

Salesforce stores text fields as Unicode, but the storage path on older orgs truncates 4-byte UTF-8 sequences. Most emoji, including 🌷, are 4 bytes in UTF-8. The API accepts the payload, returns 200 OK, and stores the text up to the byte before the emoji. The rest of the string is also dropped, so a signature line that says "Groet, Marieke 🌷 (school start vandaag)" becomes "Groet, Marieke ". The fix is org-level: move to Long Text Area or Rich Text Area, both of which use the four-byte-capable storage path.

HubSpot v3 vs form-submission emoji handling

HubSpot's CRM v3 API accepts emoji in singleline_text properties without issue. The legacy form-submission endpoint, POST /uploads/form/v2/..., does not. It returns 200 OK and stores the text with the emoji stripped, no error, no header. If your agent ever falls back to that endpoint (we did, to keep marketing-attribution cookies intact), every emoji disappears silently.

Pipedrive search index drops 4-byte UTF-8

Pipedrive's /persons/search endpoint runs a Unicode normalization step that strips 4-byte sequences from the searchable index. The person record itself keeps the emoji. Searching for the name afterwards does not. We had a BDR insist that "Daan 🚀 Visser" did not exist in the CRM. He was there, indexed as "Daan Visser" with two spaces and no rocket.

Salesforce Bulk API 2.0 hides failures in a separate CSV

The Bulk 2.0 successfulResults and failedResults endpoints split successes and failures into separate CSVs. The job-status endpoint returns 200 OK with state: "JobComplete" even when half the records failed. If your agent only checks job status, you are flying blind.

JOB_ID=$1
curl -sS "$SF_INSTANCE/services/data/v60.0/jobs/ingest/$JOB_ID/failedResults" \
  -H "Authorization: Bearer $SF_TOKEN" \
  | wc -l   # > 1 means the job-status endpoint hid failures

HubSpot batch update returns COMPLETE_WITH_ERRORS

The HubSpot POST /crm/v3/objects/contacts/batch/update endpoint returns a 200 OK with a status field of either "COMPLETE" or "COMPLETE_WITH_ERRORS". Many SDKs check the HTTP status only and treat COMPLETE_WITH_ERRORS as success. Read the body. Always.

Warning

Every API in this tier returns 200 OK on partial data loss. If your agent's success criterion is "HTTP status 2xx," you have a data-quality bomb on a timer.

Tier 3: the merely annoying

These cost time but don't quietly destroy data.

  • Salesforce composite endpoint: sub-request failures show up in the body, but the top-level HTTP is 200 unless you set allOrNone: true. Set it.
  • HubSpot associationTypeId mismatch: an unknown ID falls back to the default association without warning. Validate IDs against /crm/v4/associations/{from}/{to}/labels at startup.
  • Pipedrive v1 add_time: accepts ISO 8601, stores in the account's timezone. Send timestamps with explicit offsets and confirm the stored value with a follow-up GET during testing.
  • HubSpot UTM properties: every form submission overwrites hs_analytics_source and friends. Snapshot original attribution to a separate custom property on first touch.
  • Salesforce PATCH on empty string: nulls the field rather than leaving it untouched. Strip empty strings server-side before forwarding.

The reference card

This is the card we pinned in the team channel. Rank is "how badly does this destroy data if you ship blind."

#VendorQuirkSymptom
1SalesforceLead merge wins on master leadSource even if blankAttribution overwritten with "Web - Default"
2HubSpotContact merge collapses lifecyclestage historyDashboards lose stage timelines
3PipedriveDeal merge drops non-primary custom fieldsContract references vanish
4SalesforceallowSave=true bypasses dedupeDuplicates pile up under retries
5SalesforceconvertedLeadSource doesn't reach AccountConverted leads look sourceless
6Salesforce4-byte UTF-8 truncation on text fieldsStrings cut off at first emoji
7HubSpotForm-submission endpoint strips emojiPasted signatures lose characters
8PipedriveSearch index strips 4-byte UTF-8Records exist but can't be found
9SalesforceBulk 2.0 hides failures in failedResults CSVHalf a job fails silently
10HubSpotBatch update returns COMPLETE_WITH_ERRORSSDK consumers miss failures
11SalesforceComposite returns 200 without allOrNonePartial writes go undetected
12HubSpotBad associationTypeId falls back silentlyWrong association created
13Pipedrivev1 add_time stored in account timezoneOff-by-N-hours timestamps
14HubSpotUTM properties auto-overwrite on form submitOriginal attribution lost
15SalesforcePATCH "" nulls field instead of skippingFields blanked on partial updates

What we changed in the agent

Three patterns came out of this rollout. First, every merge call is preceded by a diff snapshot: we GET both records, store the union of non-empty field values to a small Postgres audit table, and only then POST the merge. If the merged result loses a value that was present pre-merge, the agent re-PATCHes it. Second, every batch and bulk endpoint has a follow-up reader that pulls the per-object status, not just the HTTP code. Third, the agent normalizes payloads to application/json; charset=utf-8 on the way out and validates the round-trip on a random 1% sample by re-GETting the record and comparing strings byte-for-byte. That last check is what surfaced the tulip.

The snapshot pattern is short enough to paste into any worker queue:

def safe_merge(primary_id, duplicate_ids, vendor_merge_fn):
    snapshot = {}
    for rec_id in [primary_id, *duplicate_ids]:
        record = vendor_get(rec_id)
        for field, value in record.items():
            if value and field not in snapshot:
                snapshot[field] = value

    merged = vendor_merge_fn(primary_id, duplicate_ids)

    for field, original_value in snapshot.items():
        if not merged.get(field):
            vendor_patch(merged["id"], {field: original_value})

    return vendor_get(merged["id"])  # round-trip verify

The first time we shipped this, the round-trip verify fired within an hour. It wasn't an emoji that time. It was a Dutch surname containing the ij digraph, stored as two combining characters in HubSpot but as a single grapheme in our normalization layer. The bytes didn't match, even though the strings rendered identically in every UI we looked at. We added a Unicode NFC normalization pass on every outbound payload after that. It is not data loss in the literal sense, but it broke the equality check, and the equality check is the whole point.

Detection in production

The cheatsheet earns its rent on the day a vendor changes behavior under you. We run three checks on a continuous loop. The first is a canary record per CRM, written and read every five minutes by a small worker: if the round-trip string equality fails on any field, PagerDuty fires. The second is a daily diff over the previous 24 hours of LeadSource values, segmented by source integration. If any segment shifts more than 10% week-over-week without a known campaign change, someone gets a Slack notification. The third is an hourly query against the audit table for fields the agent had to re-PATCH after a merge: a spike in re-PATCHes means a vendor behavior shifted under us, and a new row needs to go on the cheatsheet.

Three weeks after we put these checks in, the canary caught a quiet Salesforce change: convertedLeadSource began populating the Account record on some orgs but not others. We saw the drift within the hour. We did not have to find it from a quarterly attribution audit that would have been mostly someone in finance asking, in October, why Q2's numbers had wandered.

When we built the sales-enablement AI agent for the Utrecht client, the thing we ran into was that "200 OK" turns out to mean "I received your request" rather than "I wrote your data." We ended up solving it by making the agent untrust every vendor's success response and verify the write by reading it back on a sample.

The smallest thing you can do today: pull the last 200 leads created by any integration on your Salesforce org, group by LeadSource, and count how many are "Web - Default" when the integration was supposed to set something else. If that number is greater than zero, you have at least one quirk on this list, and you have it in production right now.

Key takeaway

Every CRM REST API in this rollout returned 200 OK on partial data loss — if your success check is HTTP status, you have a data-quality bomb on a timer.

FAQ

Does Salesforce really overwrite leadSource on a duplicate merge?

Yes. The master record's LeadSource wins even when it's blank. PATCH the master with the duplicate's value first if the master is empty, then call the merge endpoint.

Why does HubSpot return 200 OK when batch updates fail?

The batch endpoint returns COMPLETE_WITH_ERRORS in the body with per-object status codes. The HTTP status only reflects whether the request was received, not whether all writes succeeded.

How do I keep emoji in custom CRM fields?

Move Salesforce text fields to Long Text Area or Rich Text Area for 4-byte UTF-8 support. In HubSpot, use the v3 CRM API rather than the legacy form-submission endpoint. Pipedrive will still strip them from search.

What's the smallest check that catches most of these quirks?

Read the record back after the write on a random sample and compare strings byte-for-byte. Byte-equality is a stronger health check than HTTP 2xx for any CRM integration.

integrationsai agentsautomationcase studyworkflowtooling

Building something?

Start a project