← Blog

Integrations

Inbox-agent quirks: 17 traps in Graph, Gmail, Exchange

Seventeen Microsoft Graph, Gmail, and Exchange Online quirks from a Tilburg accountantskantoor's triage rollout, ranked by which ones silently lose data.

Jacob Molkenboer· Founder · A Brand New Company· 18 Jun 2026· 9 min
Wooden sorting rack with cream envelopes on ivory paper, one tied with green ribbon, brass bell and leather blotter beside.

It's a Tuesday in March, 22:47, and the operations lead at a 26-person accountantskantoor in Tilburg is forwarding her ninth jaarrekening of the evening to a client. The triage agent we shipped two weeks earlier was supposed to thread these into a single conversation, tag the file, and queue a follow-up. Instead, the agent sees seven separate threads. The 5.3 MB PDF never arrived in the client's inbox at all, but our log shows a clean 202 Accepted from Microsoft Graph. Nothing was retried, because nothing failed.

That night we started a quirks list. It is now seventeen entries long, ranked by how badly each one bites you before you notice. This is the cheatsheet we wish we had on day one.

How we ranked them

Loud failures are easy. A 429 throttle, a 403 permissions error, a 400 with a sensible message — those all surface in any half-decent observability stack. What kills you is the silent failure: the API that returns success and either drops data, rewrites a field your join depends on, or quietly truncates the work. Everything in tier 1 either silently dropped the conversationId on a forwarded message or returned a 2xx while losing the attached PDF. Tier 2 is loud but expensive to misdiagnose. Tier 3 is structural; it will not break your week, but it will shape your architecture.

Tier 1: silent data loss

1. Graph /sendMail with attachments over 4 MB

The single worst trap in Microsoft Graph. If you POST a message with a base64-encoded attachment in the attachments array and the file is larger than 4 MB, the server returns... it depends. Sometimes 413, sometimes a 400, and the case that ate our jaarrekening: 202 Accepted with the attachment quietly stripped from the outbound MIME body. Microsoft documents that anything over 3 MB should use an upload session, and the silent-strip behaviour kicks in around 4 MB. The fix is non-negotiable: always use createUploadSession for anything larger than 3 MB. Do not trust the 2xx. The canonical pattern, in Python:

import requests

GRAPH = "https://graph.microsoft.com/v1.0"
CHUNK = 4 * 1024 * 1024  # 4 MiB; Graph allows up to ~60 MiB per chunk

def attach_large(token: str, user_id: str, msg_id: str,
                 name: str, blob: bytes) -> None:
    create = f"{GRAPH}/users/{user_id}/messages/{msg_id}" \
             f"/attachments/createUploadSession"
    body = {"AttachmentItem": {
        "attachmentType": "file", "name": name, "size": len(blob),
    }}
    session = requests.post(
        create, json=body,
        headers={"Authorization": f"Bearer {token}"},
    ).json()
    upload_url = session["uploadUrl"]

    size = len(blob)
    for start in range(0, size, CHUNK):
        end = min(start + CHUNK, size) - 1
        r = requests.put(upload_url, data=blob[start:end + 1], headers={
            "Content-Range": f"bytes {start}-{end}/{size}",
        })
        r.raise_for_status()

2. conversationId rewrites on Outlook web forwards

Forward a message from Outlook on the web, and the new message gets a fresh conversationId. Forward the same message from Outlook desktop, and the conversationId is preserved. We chased this for three days before we realised the client mix mattered. If your agent threads on conversationId alone, OWA forwards will fragment the thread. Fall back to internetMessageId and the In-Reply-To / References headers; those survive the round trip.

3. Gmail watch subscriptions expire silently after 7 days

Gmail push notifications go through Pub/Sub. The watch call returns an expiration timestamp, and after that timestamp the notifications stop. No final notification, no 410, no callback. If your renewal cron is wedged, you find out when a client asks why their email from Tuesday wasn't triaged. Renew daily, and alert on a missed renewal — a heartbeat on the renewal job is cheaper than a missed-mail postmortem.

4. Gmail historyId gaps on inactive mailboxes

If a mailbox is inactive for more than seven days, Gmail prunes the history. Your stored historyId is now older than the oldest available record, and history.list returns a 404. There is no way to fetch the missed delta — you have to full-sync. Detect the 404, run a full sync, and resume. Store the new historyId only after the resync commits; if you store it eagerly and crash mid-sync, you lose the rest.

5. Graph delta tokens older than 30 days

Same shape, different vendor. Microsoft Graph delta tokens for mail expire after roughly 30 days. The error here is at least a 410 rather than a silent skip, but the recovery path is the same: full re-sync. Build it on day one, not day thirty. The same recovery code path also covers the case where you change the $select set on the delta query — Graph invalidates the token and 410s.

6. Gmail batchModify caps at 1000 IDs

Pass 1001 message IDs to batchModify and you get a 400 with a message that does not name the cap. One customer moved a 4,000-message label and saw three quarters of it silently skip. Chunk to 1000.

7. Shared-mailbox permissions on Graph

Application permissions on a shared mailbox need Mail.ReadWrite.Shared and Mail.Send.Shared, not the non-Shared variants. The non-Shared scopes work fine against the app's own mailbox and fail with a 403 on the shared one. Worse: if you also configured Mail.ReadWrite the consent screen looks complete, and you only discover the gap when you try to act on the shared inbox. Verify by issuing a read against the shared mailbox with application identity before wiring the send path; the read fails fast and cheaply.

Tier 2: loud but expensive to misdiagnose

8. Graph throttling is per-app per-tenant AND per-mailbox

The published limit is 10,000 requests per 10 minutes per app per tenant. The less-discussed limit that bit us is 4 concurrent requests per mailbox. With 26 employees and a triage agent that fans out, we sat well under the tenant cap and got throttled per-user. Concurrency, not volume, was the constraint. We run a token bucket keyed on the mailbox principal capped at three concurrent calls and 600 requests per minute, with exponential backoff that honours the Retry-After header. Globally rate-limiting on the tenant does not save you.

9. Graph webhook subscriptions: 4230-minute ceiling

Microsoft Graph mail subscriptions cap at 4230 minutes — just under three days. Renew at 80% of TTL, not the last minute. We saw renewals queued behind throttles miss the window during a busy week, and the resulting gap was invisible until reconciliation.

10. Gmail attachments use base64url, not base64

The data field on an attachment is base64url-encoded without padding. Run it through a standard base64 decoder and you get a corrupted file. In Python:

import base64

def decode_gmail_attachment(data: str) -> bytes:
    # base64url, padding stripped
    padding = 4 - (len(data) % 4)
    if padding != 4:
        data += "=" * padding
    return base64.urlsafe_b64decode(data)

11. Domain-wide delegation needs both ends

Gmail's domain-wide delegation requires the OAuth client to be enabled with the scope in the Google Admin console AND the scope requested at token-exchange time. Set it in one place but not the other and the call returns 403 with a vague message. Configure the admin console first, then test against a user mailbox you control before promoting the change to production tenants.

12. Outlook "Send on behalf" vs "Send as"

"Send on behalf" needs Mail.Send.Shared; "Send as" needs the mailbox-level Send As permission set in Exchange Online, not in Azure AD. We had a partner think we'd shipped a bug for two days because his confirmation emails said "via" the agent address.

Tier 3: structural — shape your architecture around these

13. EWS retires October 2026

If you inherit a system that uses Exchange Web Services, the migration timer is already running. Microsoft retires EWS in Exchange Online on 1 October 2026. As of mid-2026 that is roughly a quarter away. Plan the Graph migration now — the API surface is not 1:1 and you will need to reimplement throttling, attachments, and subscription handling.

14. conversationId and Gmail threadId are not interchangeable

If your agent spans both providers, you need a normalization layer that hashes Message-ID, In-Reply-To, and References headers. Cross-provider threading on vendor IDs alone will fragment within a week. The recipe we use: sha256(message_id + "|" + (in_reply_to or "") + "|" + first_reference) as the canonical thread key, with conversationId and threadId as secondary lookup indexes.

15. Gmail labels are user-scoped

A label on user A's mailbox is not the same entity as the same-named label on user B's mailbox. They have different IDs. If you want a tenant-wide taxonomy, store the label name and look up the ID per user at send time.

16. Graph internetMessageHeaders caps at 5 on send

Want to round-trip more than 5 custom headers on outbound mail through Graph? You cannot. Five is the documented cap, and the 6th silently drops. Pack state into a single JSON-encoded header if you need more.

17. Outlook web client strips inline signature images on agent forwards

The least urgent, the most embarrassing. If your agent forwards a message that contains an inline-image signature, the OWA renderer often drops the image and leaves a broken alt tag in place. Strip them yourself before the forward and re-attach as a flat footer.

The day-one stack we ended up with

After those seventeen, the architecture collapsed into five rules:

  • Treat 2xx as "queued," not "done." Every send writes a pending record keyed on the SHA-256 of the rendered MIME body, and only clears when a delivery webhook fires or a sent-items presence check finds the same hash within five minutes.
  • Thread on a hash of Message-ID + In-Reply-To + the first 30 chars of subject, with conversationId / threadId as a secondary key. Subject is in the hash so quoted-reply chains with a missing References still cluster.
  • Upload sessions for anything over 3 MB. Always. Even when the file is "probably 2.8 MB." The size check runs on the rendered bytes, not the input file, because base64 expansion crosses the threshold for plenty of borderline cases.
  • Log every send as a structured event with the body hash, recipient count, and attachment count. The reconciliation job replays those events against the sent-items folder. A missing match after five minutes pages on-call with the original payload pre-attached, so the fix is a re-send button rather than an investigation.
  • Run a synthetic mailbox per tenant that gets a known test message hourly. If the agent does not process it within 90 seconds, page someone. This catches expired Graph subscriptions, throttle storms, and Gmail watch renewal failures before any client notices.
Warning

If you take one thing from this list: in Microsoft Graph, 202 Accepted is not delivery confirmation. It is "we'll get to it." Build your reconciliation loop on the assumption that any given send did not actually go out.

The smallest thing you can do today

Open your mail-API client and grep for sendMail. For every call site, check whether the payload includes attachments and whether the size is verified before the call. If you are using POST /me/sendMail with attachments inline and you have never explicitly capped them, you have a silent-drop bug waiting for the next 5 MB PDF. While you are there, search the same codebase for any retry on a Graph 2xx response — if there are none, your reconciliation loop is the gap.

When we built the inbox-triage AI agents for the Tilburg accountantskantoor, the lesson that cost us the most was the first one on the list: trusting a 2xx. We ended up writing a thin reconciliation layer that compares every send against the sent-items folder five minutes later, and that single check has caught more silent drops than every retry we built combined.

Key takeaway

In Microsoft Graph, a 202 Accepted on /sendMail is not delivery; build your reconciliation loop on the assumption that any send may have silently dropped.

FAQ

Does Microsoft Graph really return 202 Accepted on a failed send?

Yes. /sendMail accepts the request for asynchronous processing and does not guarantee SMTP delivery. Verify with a sent-items check or a delivery webhook before clearing the pending record.

How do I send a mail attachment larger than 4 MB through Microsoft Graph?

Use createUploadSession to chunk the file. Inline base64 attachments over 3 MB are unreliable, and around 4 MB the attachment can be silently stripped while the API still returns 202 Accepted.

Why does a Gmail watch subscription stop firing without warning?

The watch call returns an expiration timestamp, after which notifications stop with no final callback. Renew daily and alert when a renewal fails so you do not lose a day of inbound mail.

Can I use one threading key across Gmail and Outlook?

No. Gmail's threadId and Graph's conversationId are not interchangeable. Normalize on a hash of the RFC 5322 Message-ID, In-Reply-To, and References headers if your agent spans both.

ai agentsemail automationintegrationsworkflowcase studyoperations

Building something?

Start a project