Integrations

Shopify webhook logs: telling app faults from network ones

Four patterns inside a Shopify webhook log that separate a flaky integration from a flaky network. Read them once and you will stop blaming the wrong layer.

Jacob Molkenboer· Founder · A Brand New Company· 7 Jun 2024· 6 min

Two brass postal tags linked by waxed thread beside a folded cream telegram with green ribbon on an ivory desk.

A Monday morning, a fulfilment lead calls. Orders are showing up twice in their warehouse software. Sometimes not at all. Shopify support replies with the line every integration engineer dreads: "we successfully delivered the webhook on our side." Your team replies with its mirror: "we never received it." Both are usually true. The fight is about whose log is wrong.

Every Shopify integration has this conversation at least once. We have run it a dozen times with clients on home-goods storefronts, subscription apps, and middleware sitting between Shopify and an ERP. The good news: the answer is almost always already in the webhook log. You just have to know what to read.

Here are the four patterns we look for first. None require new tooling. They live in the headers and timestamps you are already capturing.

The headers that carry the diagnosis

Shopify decorates every webhook with a handful of headers worth memorising. The ones that matter for triage:

X-Shopify-Webhook-Id: a UUID that is stable across every retry of the same event.
X-Shopify-Triggered-At: ISO-8601 timestamp of when the event fired on Shopify's side.
X-Shopify-Topic: for example orders/create, orders/updated.
X-Shopify-Hmac-Sha256: base64 HMAC over the raw request body.
X-Shopify-Api-Version: useful when behaviour shifts after an API upgrade.

If your log captures those five plus your own arrival timestamp, the response code you returned, and the first 200 bytes of body, you have everything triage needs.

Pattern 1: the retry ladder

Shopify retries failed webhooks up to 19 times over 48 hours, with the gap widening between attempts. Grep a single X-Shopify-Webhook-Id and you should see one row. If you see a ladder of arrivals 1, 2, 4, 8, 15 minutes apart, Shopify is telling you your endpoint returned non-2xx.

The shape of the ladder tells you which flavour of failure. A run that ends with a clean 200 means your app eventually recovered, usually after a deploy or a downstream service came back. A run that climbs to nineteen rungs and stops with a 500 each time means a logic error that is reproducible on the payload.

grep "X-Shopify-Webhook-Id: 4b...e2" access.log | awk '{print $1, $9}'
2026-06-02T09:14:01Z 500
2026-06-02T09:15:03Z 500
2026-06-02T09:17:08Z 500
2026-06-02T09:21:14Z 500
2026-06-02T09:29:31Z 200

That final 200 is the one to inspect. What changed between 09:14 and 09:29? A deploy? A cache warm-up? A foreign key that finally existed because the related customers/create webhook had landed?

Pattern 2: the HMAC drift

HMAC failures come in two flavours. The boring one is a wrong shared secret, which fails one hundred percent of the time from the moment you flip it on. The interesting one is intermittent: most webhooks verify, but a slice of them do not, and the failures cluster around a deploy or a middleware change.

Intermittent HMAC failures are almost never Shopify. They are almost always your stack mutating the request body before you sign-check it. The usual suspects, in order of how often we hit them:

A JSON body parser running before the verifier, leaving you with a re-serialised body that no longer matches the bytes Shopify hashed.
A reverse proxy rewriting newlines or stripping a trailing byte.
An emoji or other multibyte character in a customer name, decoded as the wrong charset somewhere in the chain.

The fix is always the same: verify against the raw bytes, before any parser touches them. In Express that looks like this:

app.post(
  '/webhooks/shopify',
  express.raw({ type: 'application/json' }),
  (req, res) => {
    const hmac = req.get('X-Shopify-Hmac-Sha256');
    const digest = crypto
      .createHmac('sha256', process.env.SHOPIFY_WEBHOOK_SECRET)
      .update(req.body) // Buffer, not string
      .digest('base64');
    if (digest !== hmac) return res.status(401).end();
    const event = JSON.parse(req.body.toString('utf8'));
    // ... handle event
    res.status(200).end();
  }
);

Warning

If your framework's body parser is registered globally, the verifier sees a re-serialised string and every HMAC fails. Mount the raw parser on the webhook route only, before any global JSON middleware.

Pattern 3: the silent gap

This is the one most teams misdiagnose. A merchant complains that orders are missing. Your app log shows no record of the webhook. Shopify's webhook dashboard shows it as delivered with a 200 response code. Both sides feel certain they are right.

Here, "delivered" means Shopify's TCP layer got a 200 from something. That something might be your CDN edge, your load balancer, a stale Heroku router, or a Cloudflare worker that absorbed the request and never proxied it onward. The diagnosis is to compare three logs at once: Shopify's delivery log, your edge or load balancer log, and your app log. The layer where the gap opens is the broken layer.

Network gaps have a distinct shape. They cluster in time (one bad five-minute window), they affect every topic equally, and they leave no trace at all on your app side. Application gaps affect a specific topic or a specific payload shape, and your app log shows the request landing before something goes wrong.

Pattern 4: the skew that grows

Compute received_at minus X-Shopify-Triggered-At for every webhook and chart it. On a healthy integration the line is a flat band under five seconds, with the occasional spike. Anything else is a diagnosis.

A line that drifts upward over hours means your app is processing slower than Shopify is dispatching, and Shopify's queue for your endpoint is filling. The fix is either to speed up your handler or to acknowledge the webhook in under a second and process asynchronously. Shopify is blunt about the deadline: respond within five seconds or you risk being marked unhealthy and eventually unsubscribed.

A line that spikes only on specific topics points at a downstream dependency that is slow for those payloads, like an inventory lookup, a tax calculation, or an external ERP call. A line that spikes on every topic points at your own infrastructure.

What this changes about the conversation

The reason these four patterns matter is not technical, it is political. When a merchant is losing orders, an "it's not our fault" answer that cannot point at a specific log line is worth nothing. An answer that says "your TCP layer accepted the webhook but our app never saw the request, here are the three log timestamps proving it" ends the argument and shortens the outage.

When we built the order-sync layer for a Dutch home-goods merchant, the four checks above caught a Cloudflare worker returning 200 to Shopify and silently dropping the body on every webhook over 64KB, a config drift nobody had spotted for nine days. The same playbook fits any webhook-driven integration we build.

The smallest thing you could do today: add X-Shopify-Webhook-Id, X-Shopify-Triggered-At, and your own received_at to whatever your app already logs. Five minutes of work. The next time the question lands, you will have the answer before the call ends.

Key takeaway

Four headers and one timestamp delta will tell you whether a missing Shopify webhook is your app's fault or your network's.

FAQ

How long does Shopify retry a failed webhook?

Up to 19 attempts spread over 48 hours, with widening gaps between tries. After 48 hours of failure Shopify can mark the topic unhealthy and eventually unsubscribe your endpoint.

Why do my HMAC checks fail intermittently?

Almost always because a body parser runs before the verifier and re-serialises the payload. Verify against the raw request bytes, before any JSON middleware touches the request.

Should I acknowledge a webhook before or after processing it?

Before. Return a 200 in under five seconds and process asynchronously. Shopify marks slow endpoints as unhealthy, then stops sending events to them.

Shopify says delivered, my app never saw it. What now?

Compare three logs at once: Shopify's delivery log, your edge or load-balancer log, and your app log. The layer where the gap first opens is the layer that is broken.

integrationse-commerceworkflowtoolingoperationsarchitecture

Building something?

Start a project