Integrations
CRM API quirks: silent merges and dropped emoji in prod
We rolled out a sales-enablement agent across 24 BDRs in Utrecht. Three weeks in, the audit logs showed every leadSource attribution had been overwritten.

The BDR sat down at 09:14 on a Tuesday in Utrecht and pasted a Dutch customer's email signature into a lead note. The signature ended in a tulip — 🌷 — and a line about the customer's daughter starting school that week. Our sales-enablement agent picked up the note, de-duplicated against a Salesforce Lead that had been imported from a LinkedIn export the prior Friday, merged the two records, and pushed the result back to the CRM API. The response was 200 OK. The tulip disappeared. The leadSource, which had said "LinkedIn — Joost referral", was overwritten with "Web - Default". Nobody noticed for three weeks.
We're fourteen agents into production at this point, and this one — a sales-enablement agent for a 24-person B2B SaaS team — taught us more about CRM REST quirks than the previous thirteen combined. The team was split across three CRMs: Salesforce for the enterprise pipeline, HubSpot for the SMB motion, and Pipedrive holding three years of legacy deals nobody wanted to migrate. Below is the cheatsheet that came out the other side, ranked by which quirks silently destroy data versus which ones merely annoy.
Tier 1: silent leadSource overwrites on merge
These are the ones we now wrap in a pre-merge snapshot. If you do nothing else after reading this post, do this.
Salesforce Lead.merge picks the master's leadSource, even if blank
The Salesforce composite Lead endpoints document that "the master record wins" for most fields, but the wording hides a sharp edge: LeadSource on the master wins even when the master's value is blank and the duplicate's value is the real attribution. We confirmed this in a sandbox with seven test rows. The fix is sequence-sensitive:
curl -X PATCH \
"$SF_INSTANCE/services/data/v60.0/sobjects/Lead/$MASTER_ID" \
-H "Authorization: Bearer $SF_TOKEN" \
-H "Content-Type: application/json; charset=utf-8" \
-d "{\"LeadSource\": \"$DUPLICATE_LEAD_SOURCE\"}"
# THEN run the merge. Order matters.
We patch the master with the duplicate's LeadSource first if the master is blank, then merge. The merge will not undo a PATCH that ran a millisecond earlier.
HubSpot contact merge collapses lifecyclestage history
HubSpot's contacts API exposes POST /crm/v3/objects/contacts/merge, and the docs are clear that the primary contact's properties win. What they don't make obvious is that the property-history endpoint loses the secondary's lifecyclestage timeline entirely. If the secondary went MQL → SQL → Opportunity and the primary stayed at "lead," the merged record looks like a lead that has never moved. For attribution dashboards this is fatal.
Pipedrive deal merge silently drops non-primary custom fields
Pipedrive's deal merge endpoint returns 200 OK with the merged deal body. Custom fields set on the non-primary deal but unset on the primary are not in the response. They are also not in the deal afterwards. There is no warning header, no X-Discarded body field, nothing. We caught it because a finance custom field — the contract reference for invoicing — vanished on three merges in one afternoon.
Salesforce duplicate-rule bypass
The Sforce-Duplicate-Rule-Header: allowSave=true header is the recommended way to push a record through a strict duplicate rule. It also silently skips the dedupe altogether. We caught this in audit logs: 41 leads created in a single afternoon, all with allowSave=true, all duplicates of existing leads. The agent had been instructed to set the header on retry. It set it on every call.
convertedLeadSource does not flow to the Account
When you convert a Salesforce Lead, the LeadSource field on the resulting Contact is populated. The Account gets nothing. If your reporting joins through Account, every converted lead looks sourceless. We solved this by patching the Account's AccountSource field as a post-conversion step in the agent's workflow.
Tier 2: 200 OK with quiet data loss (the emoji tier)
This is where the tulip went.
Salesforce custom text fields and 4-byte UTF-8
Salesforce stores text fields as Unicode, but the storage path on older orgs truncates 4-byte UTF-8 sequences. Most emoji, including 🌷, are 4 bytes in UTF-8. The API accepts the payload, returns 200 OK, and stores the text up to the byte before the emoji. The rest of the string is also dropped, so a signature line that says "Groet, Marieke 🌷 (school start vandaag)" becomes "Groet, Marieke ". The fix is org-level: move to Long Text Area or Rich Text Area, both of which use the four-byte-capable storage path.
HubSpot v3 vs form-submission emoji handling
HubSpot's CRM v3 API accepts emoji in singleline_text properties without issue. The legacy form-submission endpoint, POST /uploads/form/v2/..., does not. It returns 200 OK and stores the text with the emoji stripped, no error, no header. If your agent ever falls back to that endpoint (we did, to keep marketing-attribution cookies intact), every emoji disappears silently.
Pipedrive search index drops 4-byte UTF-8
Pipedrive's /persons/search endpoint runs a Unicode normalization step that strips 4-byte sequences from the searchable index. The person record itself keeps the emoji. Searching for the name afterwards does not. We had a BDR insist that "Daan 🚀 Visser" did not exist in the CRM. He was there, indexed as "Daan Visser" with two spaces and no rocket.
Salesforce Bulk API 2.0 hides failures in a separate CSV
The Bulk 2.0 successfulResults and failedResults endpoints split successes and failures into separate CSVs. The job-status endpoint returns 200 OK with state: "JobComplete" even when half the records failed. If your agent only checks job status, you are flying blind.
JOB_ID=$1
curl -sS "$SF_INSTANCE/services/data/v60.0/jobs/ingest/$JOB_ID/failedResults" \
-H "Authorization: Bearer $SF_TOKEN" \
| wc -l # > 1 means the job-status endpoint hid failures
HubSpot batch update returns COMPLETE_WITH_ERRORS
The HubSpot POST /crm/v3/objects/contacts/batch/update endpoint returns a 200 OK with a status field of either "COMPLETE" or "COMPLETE_WITH_ERRORS". Many SDKs check the HTTP status only and treat COMPLETE_WITH_ERRORS as success. Read the body. Always.
Every API in this tier returns 200 OK on partial data loss. If your agent's success criterion is "HTTP status 2xx," you have a data-quality bomb on a timer.
Tier 3: the merely annoying
These cost time but don't quietly destroy data.
- Salesforce composite endpoint: sub-request failures show up in the body, but the top-level HTTP is
200unless you setallOrNone: true. Set it. - HubSpot associationTypeId mismatch: an unknown ID falls back to the default association without warning. Validate IDs against
/crm/v4/associations/{from}/{to}/labelsat startup. - Pipedrive v1
add_time: accepts ISO 8601, stores in the account's timezone. Send timestamps with explicit offsets and confirm the stored value with a follow-up GET during testing. - HubSpot UTM properties: every form submission overwrites
hs_analytics_sourceand friends. Snapshot original attribution to a separate custom property on first touch. - Salesforce PATCH on empty string: nulls the field rather than leaving it untouched. Strip empty strings server-side before forwarding.
The reference card
This is the card we pinned in the team channel. Rank is "how badly does this destroy data if you ship blind."
| # | Vendor | Quirk | Symptom |
|---|---|---|---|
| 1 | Salesforce | Lead merge wins on master leadSource even if blank | Attribution overwritten with "Web - Default" |
| 2 | HubSpot | Contact merge collapses lifecyclestage history | Dashboards lose stage timelines |
| 3 | Pipedrive | Deal merge drops non-primary custom fields | Contract references vanish |
| 4 | Salesforce | allowSave=true bypasses dedupe | Duplicates pile up under retries |
| 5 | Salesforce | convertedLeadSource doesn't reach Account | Converted leads look sourceless |
| 6 | Salesforce | 4-byte UTF-8 truncation on text fields | Strings cut off at first emoji |
| 7 | HubSpot | Form-submission endpoint strips emoji | Pasted signatures lose characters |
| 8 | Pipedrive | Search index strips 4-byte UTF-8 | Records exist but can't be found |
| 9 | Salesforce | Bulk 2.0 hides failures in failedResults CSV | Half a job fails silently |
| 10 | HubSpot | Batch update returns COMPLETE_WITH_ERRORS | SDK consumers miss failures |
| 11 | Salesforce | Composite returns 200 without allOrNone | Partial writes go undetected |
| 12 | HubSpot | Bad associationTypeId falls back silently | Wrong association created |
| 13 | Pipedrive | v1 add_time stored in account timezone | Off-by-N-hours timestamps |
| 14 | HubSpot | UTM properties auto-overwrite on form submit | Original attribution lost |
| 15 | Salesforce | PATCH "" nulls field instead of skipping | Fields blanked on partial updates |
What we changed in the agent
Three patterns came out of this rollout. First, every merge call is preceded by a diff snapshot: we GET both records, store the union of non-empty field values to a small Postgres audit table, and only then POST the merge. If the merged result loses a value that was present pre-merge, the agent re-PATCHes it. Second, every batch and bulk endpoint has a follow-up reader that pulls the per-object status, not just the HTTP code. Third, the agent normalizes payloads to application/json; charset=utf-8 on the way out and validates the round-trip on a random 1% sample by re-GETting the record and comparing strings byte-for-byte. That last check is what surfaced the tulip.
The snapshot pattern is short enough to paste into any worker queue:
def safe_merge(primary_id, duplicate_ids, vendor_merge_fn):
snapshot = {}
for rec_id in [primary_id, *duplicate_ids]:
record = vendor_get(rec_id)
for field, value in record.items():
if value and field not in snapshot:
snapshot[field] = value
merged = vendor_merge_fn(primary_id, duplicate_ids)
for field, original_value in snapshot.items():
if not merged.get(field):
vendor_patch(merged["id"], {field: original_value})
return vendor_get(merged["id"]) # round-trip verify
The first time we shipped this, the round-trip verify fired within an hour. It wasn't an emoji that time. It was a Dutch surname containing the ij digraph, stored as two combining characters in HubSpot but as a single grapheme in our normalization layer. The bytes didn't match, even though the strings rendered identically in every UI we looked at. We added a Unicode NFC normalization pass on every outbound payload after that. It is not data loss in the literal sense, but it broke the equality check, and the equality check is the whole point.
Detection in production
The cheatsheet earns its rent on the day a vendor changes behavior under you. We run three checks on a continuous loop. The first is a canary record per CRM, written and read every five minutes by a small worker: if the round-trip string equality fails on any field, PagerDuty fires. The second is a daily diff over the previous 24 hours of LeadSource values, segmented by source integration. If any segment shifts more than 10% week-over-week without a known campaign change, someone gets a Slack notification. The third is an hourly query against the audit table for fields the agent had to re-PATCH after a merge: a spike in re-PATCHes means a vendor behavior shifted under us, and a new row needs to go on the cheatsheet.
Three weeks after we put these checks in, the canary caught a quiet Salesforce change: convertedLeadSource began populating the Account record on some orgs but not others. We saw the drift within the hour. We did not have to find it from a quarterly attribution audit that would have been mostly someone in finance asking, in October, why Q2's numbers had wandered.
When we built the sales-enablement AI agent for the Utrecht client, the thing we ran into was that "200 OK" turns out to mean "I received your request" rather than "I wrote your data." We ended up solving it by making the agent untrust every vendor's success response and verify the write by reading it back on a sample.
The smallest thing you can do today: pull the last 200 leads created by any integration on your Salesforce org, group by LeadSource, and count how many are "Web - Default" when the integration was supposed to set something else. If that number is greater than zero, you have at least one quirk on this list, and you have it in production right now.
Key takeaway
Every CRM REST API in this rollout returned 200 OK on partial data loss — if your success check is HTTP status, you have a data-quality bomb on a timer.
FAQ
Does Salesforce really overwrite leadSource on a duplicate merge?
Yes. The master record's LeadSource wins even when it's blank. PATCH the master with the duplicate's value first if the master is empty, then call the merge endpoint.
Why does HubSpot return 200 OK when batch updates fail?
The batch endpoint returns COMPLETE_WITH_ERRORS in the body with per-object status codes. The HTTP status only reflects whether the request was received, not whether all writes succeeded.
How do I keep emoji in custom CRM fields?
Move Salesforce text fields to Long Text Area or Rich Text Area for 4-byte UTF-8 support. In HubSpot, use the v3 CRM API rather than the legacy form-submission endpoint. Pipedrive will still strip them from search.
What's the smallest check that catches most of these quirks?
Read the record back after the write on a random sample and compare strings byte-for-byte. Byte-equality is a stronger health check than HTTP 2xx for any CRM integration.