← Blog

Incident-walkthrough

Idempotency keys and Snowflake retries: a SEPA postmortem

A reconciliation agent in Gent quietly double-booked 1,180 SEPA transactions across nine hours. The cause was an idempotency key tied to a rotated session UUID.

Jacob Molkenboer· Founder · A Brand New Company· 13 Jun 2025· 11 min
Cream envelope with green wax seal and chartreuse ribbon, bent brass key on carbon-copy receipt, red wax stub beside.

The morning the books drifted

At 09:14 on a Tuesday, the head of operations at a 32-person fintech in Gent opened her morning reconciliation dashboard and saw a number she did not recognise. The overnight agent had written 1,180 more lines into the ledger than the inbound SEPA file actually contained. Every customer balance downstream was off, some by a few cents, some by four figures. By the time we traced the cause six hours later, it came down to a single line of SQL inside an idempotency-key derivation.

The pipeline had run cleanly at 22:00 the previous evening. The dashboard at 23:30 had been green. Somewhere between midnight and breakfast, something silently doubled most of the night's bookings, and nothing in the agent's own logs flagged it as unusual.

We spent the rest of that morning unwinding the mess with their CTO over a shared screen. The first ninety minutes went on triage. The head of operations pulled three customers whose balances were off by more than 1,000 euros and put manual holds on their outgoing transfers. Their CTO froze the morning's outbound SEPA file before it left the SFTP drop to the bank. The remaining four hours we spent inside the warehouse, working backwards from the duplicate rows toward the cause. The fact that the bug lived in a stored procedure nobody had touched in eighteen months made it harder to find, not easier. Engineers trust load-bearing code that has been quiet.

The architecture before the failure

The fintech runs a daily payment-reconciliation agent built in three layers. A scheduled Snowflake task graph picks up the morning's SEPA pain.002 and camt.054 files from a SFTP drop. A stored procedure parses them and runs a MERGE into a recon_ledger table. A Python-based classification agent then walks the ledger, flags anomalies, and queues anything suspicious for a human reviewer.

The MERGE was the load-bearing step. The Python agent ran on the assumption that the ledger was canonical, deduplicated, and frozen by the time the classifier began scanning it. That assumption had held for eighteen months, which is the worst kind of assumption. The kind that bakes itself into downstream design without anyone writing it down.

Deduplication relied on an idempotency key written into each row by the MERGE. If the same SEPA end-to-end identifier showed up twice, the key would collide and the second insert would be skipped. A standard, well-trodden pattern.

The retry that nobody saw

At 00:14, the first task in the graph hit a transient credential refresh error while reading from SFTP. The connector pulls credentials from a Snowflake external token integration, and the access token had expired two minutes earlier than the cached expiry. The retry framework caught the failure, refreshed the token, and reran the task within the configured window. Seventy-eight seconds later, the second attempt completed cleanly.

This is exactly how Snowflake's task framework is meant to behave. Transient failures retry within a window, and only persistent failures escalate. The team had configured both the retry count and the suspension behaviour deliberately, after a previous outage where a credential issue had stalled the pipeline for a full morning. The retry was not a bug. It was the system doing its job.

What the team did not realise was that the retry executed inside a brand new session. Snowflake assigns a new session identifier to each task execution, regardless of whether it is a first attempt or a retry. And the idempotency key derivation depended on the session.

The origin of the broken key

Buried in a stored procedure written eighteen months earlier was this fragment of SQL:

INSERT INTO recon_ledger (idem_key, end_to_end_id, amount_cents, value_date, raw)
SELECT
  SHA2(CONCAT_WS('|', CURRENT_SESSION(), s.end_to_end_id), 256) AS idem_key,
  s.end_to_end_id,
  s.amount_cents,
  s.value_date,
  s.raw_payload
FROM staging_sepa_inbound s
LEFT JOIN recon_ledger r
  ON r.idem_key = SHA2(CONCAT_WS('|', CURRENT_SESSION(), s.end_to_end_id), 256)
WHERE r.idem_key IS NULL;

The original author had reached for CURRENT_SESSION() to scope the key to "this run". Their reasoning, recovered from a Slack thread that night, was that they wanted to rerun a session during development and have it overwrite its own writes. The pattern they had in mind was a developer running the procedure twice from the Snowflake UI, getting clean idempotent behaviour within a single session, and clear separation between debugging sessions.

That intent broke the moment the procedure ran inside a retried task. In practice, every session got a fresh UUID, and the same SEPA transaction running through a retried session produced a different key from the same transaction in the original session. The two runs were now indistinguishable from two genuinely different inbound files that happened to contain the same end-to-end identifier.

The first run wrote 1,180 rows with key SHA2('uuid-A|EE-id', 256). The retried run, looking for collisions on the new key SHA2('uuid-B|EE-id', 256), found none. It happily inserted 1,180 rows. The MERGE was not broken. The key was.

Warning

If your idempotency key references the session, the connection, the worker, or the wall-clock minute of execution, it is not an idempotency key. It is a noise generator wearing a hash function.

The nine-hour blast radius

From 00:14 until the head of operations spotted the discrepancy at 09:14, every downstream consumer of recon_ledger treated the duplicated rows as real. The treasury report generated at 06:00 over-reported inflows by 437,000 euros. Customer-facing balance updates went out at 07:30 with inflated amounts. Three customers had their automated SEPA direct-debit mandates re-triggered because the classification agent saw the duplicate booking as evidence of an underpayment in the original transaction.

One of those mandates would have hit a customer's account for a four-figure amount on the morning of a public holiday in Belgium, after the customer service line had closed. We caught it at 10:48, with seventy minutes of headroom before the daily SEPA cutoff to the bank. The other two were straightforward. The third is the reason this team has since added a hard cap on automated mandate triggers within twelve hours of a recent batch boundary.

The classification agent did exactly what it was designed to do. It just did it on top of a corrupt table.

The classifier's blind spot

The Python classifier had been built carefully. It flagged unusual amounts, unfamiliar counterparties, and timing patterns that did not fit the customer's history. It did not check whether the same end-to-end identifier appeared twice in the same batch. That uniqueness contract had been delegated to Snowflake, and Snowflake had quietly stopped honouring it.

This is the part that catches teams shipping agentic workflows on top of data warehouses. The agent's reasoning is only as trustworthy as the substrate it sits on. If the warehouse promises uniqueness and silently stops delivering, the agent's confidence stays high and its output stays plausible. Plausible-but-wrong is the worst failure mode for any reasoning system. The interesting metric is not "did the agent answer". It is "did the agent answer on top of state that was actually true".

What the dashboards were actually telling us

The reconciliation dashboard had been green from start to finish. So had the task graph. Both monitoring surfaces measured the same thing: did the code complete without throwing an error. Neither measured the only metric that mattered in this incident, which was whether the table the code wrote to still satisfied its own integrity contract.

This is the gap that catches teams shipping any kind of automated pipeline, agentic or not. The substrate underneath the agent makes promises that nothing actively verifies. A schema migration drops a unique constraint that nobody is monitoring. A retry framework changes session identity in a way that breaks an implicit assumption. A connector library updates and starts batching writes differently. None of these show up as red until something downstream falls over.

The team had row-count monitoring on the ledger but not on the ratio of distinct end-to-end identifiers to total rows. They added that check the same afternoon. It would have caught the duplicate within fifteen minutes of the second insert.

Cleaning up 1,180 duplicates at 11am

Snowflake's Time Travel saved the day, as it often does. We pulled the table state from 23:55 the night before and diffed it against the current state. The duplicates were easy to identify once we stopped trusting idem_key as the uniqueness signal and used the natural composite key from the SEPA message itself. The diff showed exactly 1,180 rows had been inserted twice, with byte-for-byte identical content apart from the inserted_at timestamp and the broken idem_key column.

CREATE OR REPLACE TABLE recon_ledger_clean AS
SELECT
  end_to_end_id,
  amount_cents,
  value_date,
  creditor_iban,
  debtor_iban,
  ANY_VALUE(raw_payload) AS raw_payload,
  MIN(inserted_at)      AS inserted_at
FROM recon_ledger
WHERE inserted_at >= '2026-06-08 22:00:00'
GROUP BY end_to_end_id, amount_cents, value_date, creditor_iban, debtor_iban;

Once the ledger was clean, we walked downstream. The treasury report regenerated automatically. The three triggered direct-debit mandates were cancelled before settlement at T+1. Two customer-facing balance updates had to be reversed by hand, with a short apology email going out the same afternoon.

Total cost of the incident in real money: a partial refund to one customer who had based a transfer decision on the inflated balance, plus a long day of engineering. Total cost in trust: harder to measure, and the reason this team will spend the next quarter sanding down every other implicit assumption in the pipeline.

The corrected idempotency key

The fix was four lines of SQL. The idempotency key now derives only from properties intrinsic to the SEPA message and stable across any number of retries, sessions, or workers:

SHA2(
  CONCAT_WS('|',
    s.end_to_end_id,
    s.creditor_iban,
    s.debtor_iban,
    s.amount_cents,
    TO_CHAR(s.value_date, 'YYYY-MM-DD')
  ),
  256
) AS idem_key

This is the principle that the HTTP specification's notion of idempotency already encodes. The key has to be a function of the request, not of the act of processing it. The SEPA end-to-end identifier is unique per scheme by definition; the surrounding fields are belt-and-braces against malformed inputs and stitched-together corner cases.

The team also added a UNIQUE constraint on the natural composite key in the ledger itself. From now on, even if the agent code drifts again, the database will refuse to lie. A failed insert is loud. A silent duplicate is the one that costs you nine hours.

The same seam in our own work

When we built the reconciliation agent for a Dutch logistics client last quarter, the thing we ran into was a slightly different version of the same problem: a Kafka consumer-group rebalance that caused a single message to be processed twice by two different workers within ten milliseconds of each other. We solved it the same way the Gent team eventually did, by writing every idempotency key into a Postgres UNIQUE constraint with ON CONFLICT DO NOTHING, and deriving that key only from the message body's content hash. If you are building AI agents that touch money, this is the seam where most quiet failures will eventually surface.

One thing you can do today

Open your reconciliation, billing, or events pipeline and grep the source for SESSION, UUID(), NOW(), CURRENT_TIMESTAMP, or any random-number call inside an idempotency-key derivation. If you find one, you have the same bug waiting for a transient error to surface it. Replace it with a hash of the input fields, add a unique constraint to the destination table, and sleep better on Tuesdays.

Key takeaway

An idempotency key has to be a function of the request, not of the session that processes it. Anything else is a noise generator wearing a hash function.

FAQ

Why did Snowflake's task retry cause the duplicate?

The retry was correct. The bug was that the idempotency key was computed from CURRENT_SESSION(), so each retry produced a different key for the same transaction and the dedup check missed.

What should an idempotency key actually be derived from?

Only properties that are intrinsic to the message being processed: business identifiers, amounts, dates, counterparties. Never the session, worker, timestamp of execution, or any random value generated at runtime.

Could the classification agent have caught the duplicates?

Yes, with a uniqueness check on end-to-end identifiers per batch. The team had delegated that contract to Snowflake. When the warehouse stopped enforcing it, the agent had no fallback signal to notice.

How did you recover the original ledger state?

Snowflake Time Travel. The team queried the table as it existed at 23:55 the previous evening, diffed it against the current state, and rebuilt the clean ledger by grouping on the natural SEPA composite key.

ai agentsautomationcase studyarchitectureoperationsintegrations

Building something?

Start a project