← Blog

Magento

Magento 2 sales_order forensics: joins that flag refund fraud

Refund fraud rarely announces itself in Magento 2. It hides in joins you have not run yet. Here is how we read sales_order when something feels off in the dashboard.

Jacob Molkenboer· Founder · A Brand New Company· 5 Jun 2026· 7 min
Open leather ledger with brass magnifier, paper receipt, green page tab, red wax seal on ivory desk by window.

The CFO emails on a Tuesday afternoon. Refund rate has crept from 3.1% to 5.4% over six months. Marketing did not change. The product mix did not change. Nobody on the support team can remember anything different. The Magento admin shows credit memos going out at the usual pace and the usual sizes. Nothing is on fire. Something is wrong.

The Magento dashboard is not built to answer this. It shows credit memos one at a time, grouped by who clicked the button. Refund fraud almost never looks suspicious in any single row. It looks suspicious in the relationships between rows. So you stop using the admin and you open a read replica.

The schema you actually care about

The interesting tables for refund forensics are smaller than the full sales graph. Five carry most of the signal:

  • sales_order: the header row. The columns that matter are entity_id, increment_id, customer_id, customer_email, base_grand_total, total_refunded, status, created_at.
  • sales_creditmemo: one row per refund document, joined to the order through order_id.
  • sales_creditmemo_item: the line items inside each credit memo.
  • sales_order_payment: the payment method and, depending on your gateway, a fingerprint of the card or wallet.
  • sales_order_address: billing and shipping addresses, stored per order rather than per customer.

If you want the field-by-field definition that survives every version bump, read the Sales module's db_schema.xml files on Magento's GitHub. That is the source of truth; blog posts about the schema lag behind it.

Customers who own more accounts than they should

The cleanest external refund-fraud signal is one human running multiple Magento customers. Same shipping street, same card, different email. The admin will show two unrelated accounts; the database will not.

Start with reuse of shipping address across customer_ids:

SELECT
  CONCAT(LOWER(TRIM(soa.street)), '|', soa.postcode, '|', soa.country_id) AS addr_hash,
  COUNT(DISTINCT so.customer_id)    AS customers,
  COUNT(DISTINCT so.customer_email) AS emails,
  SUM(so.base_grand_total)          AS lifetime_sales,
  SUM(so.total_refunded)            AS lifetime_refunded,
  ROUND(SUM(so.total_refunded) / NULLIF(SUM(so.base_grand_total), 0), 3) AS refund_rate
FROM sales_order so
JOIN sales_order_address soa
  ON soa.parent_id   = so.entity_id
 AND soa.address_type = 'shipping'
WHERE so.created_at >= NOW() - INTERVAL 12 MONTH
GROUP BY addr_hash
HAVING customers   >= 3
   AND refund_rate >= 0.25
ORDER BY lifetime_refunded DESC
LIMIT 50;

The shape that comes back is not subtle. A normal household has one or two customer_ids at a given address over a year. A refund ring will show seven, twelve, twenty. The refund_rate column is what flips a curious result into an investigation.

Refunds processed before the money moved

Magento allows offline credit memos: a refund document recorded in the ledger without telling the payment gateway to send money. There are legitimate reasons for this. Cash on delivery, bank transfer reversals already handled by accounting, store credit. It is also the single most common path a dishonest staff member uses, because no real transaction has to clear.

Adobe documents the offline-refund mechanics in its credit memo guide. What the guide does not do is tell you which ones to look at. This query does:

SELECT
  so.increment_id,
  so.customer_email,
  so.base_grand_total,
  scm.created_at         AS refunded_at,
  scm.grand_total        AS refund_amount,
  sop.method             AS payment_method,
  sop.last_trans_id      AS gateway_txn
FROM sales_creditmemo scm
JOIN sales_order         so  ON so.entity_id  = scm.order_id
JOIN sales_order_payment sop ON sop.parent_id = so.entity_id
LEFT JOIN sales_invoice  si  ON si.order_id   = so.entity_id
WHERE scm.created_at >= NOW() - INTERVAL 6 MONTH
  AND (si.entity_id IS NULL OR sop.last_trans_id IS NULL)
ORDER BY scm.created_at DESC;

The honest cases in this set are easy to spot once you read them: COD orders, manual bank refunds, store credit issued for damaged goods. Everything else is a conversation.

Address fingerprints across "different" customers

The address join is more powerful than email matching because addresses are harder to spoof than mailboxes. A Gmail dot trick gives the same human ten emails. A delivery address is still where someone is signing for the package.

Normalise before you join. Magento stores addresses with whatever the customer typed. Capitalisation differs, "Straat" against "straat", trailing whitespace, optional house number suffixes. Two rows that look identical to a human can hash to four different strings.

Warning

If you compare addresses with raw equality you will miss most of the rings. Lowercase, trim, collapse internal whitespace, and hash. The five-line normalisation step is the difference between zero results and twenty.

A MySQL-side normaliser, good enough for a first pass:

CREATE OR REPLACE VIEW v_addr_norm AS
SELECT
  soa.parent_id AS order_id,
  soa.address_type,
  LOWER(TRIM(REGEXP_REPLACE(soa.street, '[[:space:]]+', ' '))) AS street_n,
  LOWER(TRIM(soa.postcode))                                    AS postcode_n,
  LOWER(TRIM(soa.country_id))                                  AS country_n,
  MD5(CONCAT_WS('|',
        LOWER(TRIM(REGEXP_REPLACE(soa.street, '[[:space:]]+', ' '))),
        LOWER(TRIM(soa.postcode)),
        LOWER(TRIM(soa.country_id))))                          AS addr_fp
FROM sales_order_address soa
WHERE soa.address_type IN ('billing', 'shipping');

Join that view back to sales_order wherever you were grouping on raw fields. The fingerprint column gives you a stable handle on "this is the same address" even when one row says Damrak 1 and another says DAMRAK 1 .

Staff-side patterns

Internal refund fraud is rarer than external rings but it loses more money per incident. The signal is concentration: refunds bunched outside business hours, or one terminal producing a disproportionate share of high-value credit memos.

Open-source Magento does not record the admin user on the credit memo row itself. On Adobe Commerce, or with the admin action log module enabled, you get attribution. Without it, the timing distribution is what you have, and it is enough to start:

SELECT
  DATE(scm.created_at)                                        AS day,
  SUM(CASE WHEN HOUR(scm.created_at) BETWEEN 8 AND 19 THEN 1 ELSE 0 END) AS in_hours,
  SUM(CASE WHEN HOUR(scm.created_at) NOT BETWEEN 8 AND 19 THEN 1 ELSE 0 END) AS out_of_hours,
  SUM(scm.grand_total)                                        AS refund_eur,
  SUM(CASE WHEN HOUR(scm.created_at) NOT BETWEEN 8 AND 19
           THEN scm.grand_total ELSE 0 END)                   AS refund_eur_out_of_hours
FROM sales_creditmemo scm
WHERE scm.created_at >= NOW() - INTERVAL 12 MONTH
GROUP BY day
HAVING out_of_hours > 0
ORDER BY refund_eur_out_of_hours DESC
LIMIT 60;

You are not looking for one row, you are looking for a distribution. If most of the volume is in business hours and a single Tuesday is 19% of the year's out-of-hours refunds, that is the day you reconstruct from logs.

Putting it in a nightly job

None of these queries are expensive on a healthy sales_order with proper indexes on created_at, customer_email, and parent_id. Run them as a set of SQL files against the read replica every night at 02:30, write the results into a flagged_orders table, and let the operations lead see new rows in the morning.

That is the entire shape. Replica, six queries, one table, one email. No new SaaS, no fraud-detection vendor, no machine learning model you cannot explain to the auditor.

What to do this week

If you run a Magento 2 store and you have not opened sales_order directly in the last quarter, start with the address-fingerprint query above against a twelve-month window. It takes ten minutes to run and two minutes to read. The result is either quiet or very loud. Both outcomes are useful.

When we built the refund-forensics job for a wholesale Magento client this spring, the part we underestimated was address normalisation; half the rings only held up as clusters after we added the regex pass shown above. That kind of cleanup is most of what a real Magento engagement looks like once you are past the demo.

Open a SQL client. Point it at the read replica. Run the address-fingerprint query against the last twelve months. That is the smallest first step.

Key takeaway

Refund fraud lives in the joins your Magento dashboard never runs; six SQL queries on a read replica replace most vendor fraud tooling.

FAQ

Will these queries hurt production?

Run them against a read replica, not the primary. On a normally indexed sales_order with twelve months of data they finish in seconds. The view normalising addresses is the heaviest piece; materialise it as a table if your dataset is large.

Do I need Adobe Commerce for the staff-attribution query?

For per-admin-user attribution on credit memos, effectively yes, or you need the admin action log module installed on open-source Magento. The hour-distribution query in this post works on any 2.x build without extras.

What about legitimate households with multiple accounts at one address?

They show up in the address-fingerprint query but with low refund rates. The HAVING clause filters on a 25% refund rate, which excludes almost every honest household and keeps the rings.

How is this different from Adobe Sensei or a fraud-detection vendor?

It is explainable. Every flag is a SQL row you can audit and a join you can show a regulator. Vendor models often cannot show why a transaction was flagged, which matters when the auditor asks.

magentomysqle-commercesecurityoperations

Building something?

Start a project