Integrations

Graph API throttling: 19 retry-after quirks the SDK hides

The agent had been answering inbox triage tickets for three days at a Den Haag law firm before we noticed it was quietly dropping every fifth reply thread.

Jacob Molkenboer· Founder · A Brand New Company· 11 Jun 2026· 9 min

Brass relay switch beside stack of paper slips tied with twine, green ribbon, cream index card on ivory surface.

It was a Tuesday morning when one of the partners walked into our standup with a printout. A client had emailed twice in three days asking about a deposition date. The triage agent had read both messages. It had also marked both as handled. The actual reply never went out, and nobody on the team had seen the thread.

The firm runs about 52 lawyers and paralegals across two floors in Den Haag. We had spent four weeks wiring an email-triage agent into their Microsoft 365 tenant. It read incoming mail, classified by case-area and urgency, drafted replies for the partner to sign, and posted a row to a Power Automate flow that fed their case-management system. On paper the agent was working. Throughput looked clean on the dashboard.

The real picture was uglier. We had a Microsoft Graph throttling problem, and the official Graph SDK was hiding most of it.

What was actually happening

The SDK retry middleware caught the 429s and the 503s. It backed off, retried, returned a 200 to our application code. Logs looked green. What it did not tell us was that during certain retry paths the message we were operating on had a different conversationId than the one we wrote into our database the first time around. The thread our agent thought it was replying to no longer existed. The new draft landed against a different conversation, which the partner never saw because their Outlook view was sorted by the original conversation header.

We logged every quirk we hit, ranked them by how much silent damage each one caused, and shipped a custom retry handler. What follows is the cheatsheet. Pin it next to anyone touching Microsoft Graph throttling in production.

The cheatsheet, ranked by silent damage

Top of the list is anything that returns a 200 to your code while losing the work. Bottom of the list is anything that throws cleanly. The middle is where the wreckage lives.

Tier 1: the SDK swallows it and you lose state

conversationId mutates on retry across folders. If a Power Automate flow or a user-side rule moves the message between folders while your call is mid-retry, the resource resolves to a different conversation. The SDK returns 200. Your database now points at a stale thread.
Delta token invalidated by a throttled write. A 429 on a POST against a mailbox that you are also /delta-syncing can invalidate the delta state token on the next call. The SDK quietly fetches a fresh token and replays the last page of messages. Your agent processes them twice.
Application token and delegated token throttle pools are not the same. Two services hammering the same mailbox under different auth flows do not share a quota in any documented way. We had a Power Automate connector and our agent both running on delegated permissions and tripping each other's limits without either logging the cause.
Subscription expires mid-retry, agent reconnects with a fresh clientState. Change-notification subscriptions live for 4230 minutes max for messages. If the SDK retry loop spans the renewal window, the next notification arrives with a different validation token and your gate rejects it.
Batch endpoint hides per-request 429s under a 200. A $batch call with 20 sub-requests can return 200 OK overall while individual sub-requests in the body have status 429. The SDK does not retry those sub-requests by default. Your handler iterates the response and finds five "successful" operations that were never executed.
Token refresh during a retry drops the request. If the access token expires between the original call and the retry, some middleware versions request a new token but never re-issue the underlying HTTP request. Your code sees a normal completion with an empty body.
ETag mismatch on second attempt causes a silent skip. Conditional updates retried after a 503 hit a different ETag because another client touched the resource. The SDK returns 412 to its inner loop, swallows it as transient, and your PATCH never lands.

Tier 2: visible failures the SDK reports oddly

Retry-After in seconds vs Retry-After-Ms. Graph sometimes returns Retry-After in whole seconds and sometimes a custom Retry-After-Ms in milliseconds. The SDK parses only the first. If both are present, your backoff calculation is wrong by three orders of magnitude.
503 with no Retry-After at all. Several outage-style 503s arrive with no retry hint. The SDK falls back to exponential backoff with jitter that can exceed the actual outage window by a lot. We saw 90-second sleeps for outages that resolved in 4 seconds.
Per-mailbox 10k/10min sliding window resets mid-batch. The window is documented at 10,000 requests per 10 minutes per mailbox, but it is a sliding window, not a fixed bucket. A long-running batch that starts at minute 9:59 can cross the reset and get half-throttled.
4 concurrent requests per mailbox is not per endpoint. The cap applies across read and write endpoints combined. A delta sync running in the background consumes one slot even when idle on the wire.
Retry-After in HTTP-date format crosses midnight. When Graph returns an absolute HTTP-date instead of a delta, certain SDK versions in non-UTC timezones parse a past time and retry immediately, hammering the throttle that just kicked in.
/me/messages and /users/{id}/messages are different throttle classes. Switching from the delegated /me shortcut to the explicit user-ID path during a refactor changed our quota class and the throttle profile silently shifted.

Tier 3: the operational gotchas

Mixed batch counts the write quota for the whole batch. A $batch with one write and 19 reads counts against the write-side budget for all 20.
x-ms-throttle-limit-percentage is not exposed by the official SDK. Graph emits a header that tells you exactly how close you are to a throttle, but the SDK abstracts it away. You have to write a custom delegating handler to read it.
Exponential backoff exceeds Retry-After. The SDK's default backoff curve, after three retries, sleeps longer than the server asked it to. Your worker thread is parked while the bucket has already refilled.
internetMessageId is stable, id is not. Idempotency keyed off id breaks the moment a message is moved or replied to. The internetMessageId (the RFC 5322 header) survives. Index your dedup table on that.
ClientRequestId reuse across retries collides with idempotency caches. If you reuse the same client-request-id across an SDK retry path, Graph occasionally returns the cached response from the first attempt, which may be the 429 itself.
Polling /delta during a throttled notification window misses items. If you also fall back to polling when subscription notifications stall, the delta link cursor advances past the items the subscription queued but never delivered.

Warning

If your monitoring is "did the SDK return 200", you are not monitoring Graph. Log x-ms-resource-unit, x-ms-throttle-limit-percentage, and the per-batch sub-response status codes. The SDK will not do it for you.

The retry handler that actually worked

We replaced the default RetryHandler with a custom delegating handler. The shape, in TypeScript, looked roughly like this:

import { Middleware, Context } from "@microsoft/microsoft-graph-client";

export class HonestRetry implements Middleware {
  private next?: Middleware;

  setNext(next: Middleware) { this.next = next; }

  async execute(ctx: Context): Promise<void> {
    const maxAttempts = 5;
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      await this.next!.execute(ctx);
      const res = ctx.response!;
      if (res.status < 429) return;

      const ms = parseRetry(res.headers);
      const pct = res.headers.get("x-ms-throttle-limit-percentage");
      log.warn({ attempt, status: res.status, ms, pct, url: ctx.request });

      // Honour the server. Never sleep longer than asked.
      await sleep(Math.min(ms, 30_000));

      // Re-stamp idempotency so we never replay a cached error.
      ctx.options!.headers = {
        ...ctx.options!.headers,
        "client-request-id": crypto.randomUUID(),
      };
    }
  }
}

function parseRetry(h: Headers): number {
  const ms = h.get("retry-after-ms");
  if (ms) return parseInt(ms, 10);
  const s = h.get("retry-after");
  if (!s) return 1000;
  const asInt = parseInt(s, 10);
  if (!isNaN(asInt)) return asInt * 1000;
  // HTTP-date format. Compute delta in UTC.
  const t = Date.parse(s) - Date.now();
  return t > 0 ? t : 1000;
}

Two details matter. We never sleep longer than the server requested, and we mint a fresh client-request-id on every retry so the Graph-side idempotency cache cannot replay an old 429 at us. For batch calls we wrap the response and surface any sub-request that is not 2xx as a real exception, not a swallowed warning.

Idempotency keyed on the right field

The other change that paid off immediately: we stopped using message.id as the dedup key in our triage database and switched to internetMessageId. The id rotates on folder moves and on sent-items replays. The RFC 5322 header does not. If your agent is replying, forwarding, or routing mail, this is the field you want.

For subscriptions we now renew at 75% of the documented lifetime, not at expiry. The official guidance allows up to 4230 minutes for messages, but renewal under throttle stress can take 30 seconds. Renewing early gives the retry path room to land before the subscription evaporates.

What the dashboard looks like now

We export four numbers into the firm's ops dashboard every minute: percentage of the mailbox throttle bucket consumed, count of sub-request 429s inside batch calls, count of delta-token resets in the last hour, and count of subscription renewals that took more than 5 seconds. None of those are visible if you only watch HTTP status codes.

In the first week after the new handler shipped, we found two additional quirks we had not seen before, both of which fell into tier 1. We added them to the cheatsheet above. The list will grow. Graph is a moving target and the SDK abstractions are not on your side when you are running an agent at production volume.

The five-minute audit

If you run a Graph-based agent against a customer mailbox, grep your code for three things today. First, every place you read message.id and persist it: switch to internetMessageId, or add a second column. Second, every place you call $batch: confirm you iterate sub-response status codes, not just the outer 200. Third, every place you trust the SDK's default retry: replace it, or at least log x-ms-throttle-limit-percentage on every response so you can see the wall before you hit it.

When we built the email-triage agent for the Den Haag firm, the part that took longest was not the language work or the case-area classification. It was learning where the Graph SDK lies to you. If you are wiring AI agents into Microsoft 365 and your dashboard says everything is fine, that is the moment to check the headers the SDK is hiding.

Key takeaway

If your Graph monitoring is just "did the SDK return 200", you are not monitoring Graph. The SDK hides at least seven failure modes that cost you state.

FAQ

Does the Microsoft Graph SDK retry throttled requests automatically?

Yes, but its default retry handler swallows several failure modes as success, including per-sub-request 429s inside batch calls and token refresh drops. Always log the underlying response headers yourself.

What is the safest idempotency key for a Graph mail agent?

Use internetMessageId, the RFC 5322 header. The Graph id field rotates when a message is moved between folders or replayed through sent-items, so it is not safe for deduplication.

How long should I wait before retrying a 429 from Graph?

Use the Retry-After or Retry-After-Ms header verbatim. Do not let SDK exponential backoff exceed it, and never sleep longer than the server asked you to.

Why does my agent process the same email twice after a throttling event?

A 429 on a write against a mailbox you are delta-syncing can invalidate the delta state token, so the next sync replays the last page of messages. Persist processed internetMessageIds and check on the way in.

integrationsai agentsemail automationautomationarchitectureworkflow

Building something?

Start a project