← Blog

Email automation

Outbound email audit: surviving 2026 Outlook deferrals

Before we quote an invoice-chase agent for a Dutch SME, we audit their outbound email estate. Not as a sales gate. As a survival check, run in this order.

Jacob Molkenboer· Founder · A Brand New Company· 19 Sept 2025· 9 min
Cream envelope with chartreuse wax ribbon on forest leather blotter, brass letter opener, carbon slips, ivory desk.

Tuesday, 09:14. Finance writes: "Did yesterday's payment reminders actually go out?" The operations lead opens the sending dashboard. 6,200 messages dispatched, 98% delivered. Looks fine. Then a customer calls back, angry about a second reminder for an invoice she never saw the first reminder for. Open her domain. Outlook. Junk folder. Eleven reminders sitting there since Friday afternoon.

This is the call we get most often now. Not "we want AI agents." Just: our invoice chase stopped working and nobody can tell us where.

Before we quote an invoice-chase agent for a sub-€30M Dutch SME, we audit their outbound email estate. Not as a sales gate. As a survival check. There is no point handing a client a polite, well-written agent that nudges 800 invoices a week into Outlook's quarantine.

Here is the audit, in the order we run it.

What changed in 2026

Microsoft began rolling out stricter sender requirements for Exchange Online inbound mail across 2026. The short version: senders that don't pass DMARC alignment, don't publish a meaningful DMARC policy, or send from misconfigured shared IPs increasingly get deferred (4xx) or flat-rejected (5xx). Outlook.com flipped first. Exchange Online tenants are following.

It looks a lot like the Google/Yahoo bulk-sender push that landed in 2024, but the failure mode is different. Google quietly puts mail in spam. Outlook defers it. Deferred mail sits in a retry queue and eventually arrives, hours or days late, usually after the customer has already paid (or already not paid and gone cold).

That delay is the bit your finance lead won't notice. The deliverability dashboard reads green. The reminders technically got delivered. They just got delivered at 03:40 on Saturday, into a junk folder, which is the same as not getting delivered at all.

The DNS floor

Open a terminal. This is the first thing we run.

dig TXT +short klant-domein.nl
dig TXT +short _dmarc.klant-domein.nl
dig TXT +short selector1._domainkey.klant-domein.nl
dig TXT +short selector2._domainkey.klant-domein.nl

SPF: one record, under 10 DNS lookups when flattened, includes every legitimate sender. The common failure we see: an old include:spf.protection.outlook.com left over from a 2021 Office 365 migration, plus include:_spf.google.com from a Workspace pilot, plus a forgotten include:mailgun.org from a vendor that nobody renewed, plus the SaaS billing tool's include. Five includes, easily 11 lookups, silent SPF PermError. Receivers downgrade your alignment and nobody knows why.

DKIM: a 2048-bit key per sending source, each on its own selector. Two failure patterns dominate. One shared 1024-bit key everyone reuses (rotation is impossible without breaking three vendors at once). Or no DKIM at all on the transactional sender, because someone "couldn't find where to set the DNS" during onboarding. Microsoft's enforcement notes call this out specifically. The Exchange team has been posting about it on the Microsoft Tech Community blog for the better part of a year now.

DMARC: published, with a real policy and a rua= address that someone actually reads. p=none is fine for week one. p=quarantine; pct=100 is where we want clients inside 90 days. p=reject is where the audit ends in green. BIMI is a vanity layer on top, not a deliverability lever, so we audit it last.

If any of those three are broken, the rest of the audit is moot. Fix the floor first.

Alignment, not just authentication

This is where most internal audits stop and most external problems start. SPF passing doesn't mean SPF aligning. DKIM signing doesn't mean DKIM aligning. The receiver checks whether the authenticated domain matches the From: header domain. If your invoice reminders go out from factuur@klant-domein.nl but DKIM signs them with d=mg.transactional-vendor.com, DMARC fails alignment even though both checks technically passed.

We pull a sample of the client's real outgoing mail (with permission, from a fresh test inbox) and read the Authentication-Results header by hand. Yes, by hand. Ninety seconds, catches what most automated scanners miss.

Authentication-Results: mx.google.com;
  dkim=pass header.i=@mg.transactional-vendor.com;
  spf=pass smtp.mailfrom=bounces.transactional-vendor.com;
  dmarc=fail (p=none dis=none) header.from=klant-domein.nl

That's a fail. Both SPF and DKIM passed. DMARC failed because the authenticated domain doesn't match the From domain. Under the new Outlook posture, that mail gets deferred. Under Google's posture, it gets junked. Same outcome.

The fix is either: get the vendor to sign with a CNAME'd selector under the client's own domain (mail._domainkey.klant-domein.nl CNAME to the vendor's selector), or change the From to a subdomain the vendor controls (which kills brand recognition; we never recommend it for invoice flows).

Warning

If DMARC alignment fails on more than 2% of your invoice reminders, you are gambling cashflow on receiver leniency. Outlook stopped being lenient.

Reputation drift across providers

Once the floor is solid, we look at where mail actually leaves from. Most Dutch SMEs we audit are sending transactional mail through a mix of: Microsoft 365 SMTP for "real" mail, one ESP for transactional, and Amazon SES for a reminder script someone wrote in a hurry in 2023.

Each provider has a different reputation profile. Each drifts differently.

Postmark

Runs separate IPs for transactional and broadcast streams and polices senders aggressively. Reputation drift here is slow and obvious. You get warning emails before anything breaks. The downsides: cost at volume, and a hard line on anything that smells like marketing on a transactional stream. We've had a client suspended for sending a payment-reminder template that included a one-line cross-sell. Postmark called it broadcast. They were not wrong.

Mailgun

Flexible, which is also the problem. Shared pools have noisy neighbours. We've watched a client's invoice reminders get caught in a deferral wave on Mailgun's EU shared pool because some unrelated tenant ran a bad campaign on the same /24. Dedicated IPs help, but you have to warm them, and most sub-€30M SMEs don't have the volume to keep a dedicated IP warm without padding the schedule.

Amazon SES

The cheapest and the most unforgiving. Reputation is per-account. Complaints above 0.1% get you a friendly note. Above 0.5% gets you paused. The appeal process is a support ticket. We don't put invoice chases on raw SES without a configuration set, an SNS-wired bounce handler, and a complaint dashboard the operations lead actually opens once a week.

The trap underneath all three: most clients can't tell us which provider sends what. The marketing platform sends through one. The accounting tool sends through another. The CRM sends through a third. The Microsoft 365 mailbox sends through a fourth. Each one needs its own DKIM selector, its own SPF include, and its own reputation watch. We map this on a whiteboard before quoting anything. It usually takes two hours and surprises everyone in the room, including the IT lead.

The three flows that have to survive a deferral

Now the interesting question. Imagine Outlook starts deferring 40% of your mail next Monday morning. Which transactional flows would your finance lead notice the same day, and which would silently break for a week?

Across roughly sixty estates we've audited over the last two years, three flows have to survive a sudden deferral or the business takes real damage:

1. Password resets. If these defer for three hours, your support inbox lights up. Operations notice fast. Good news: most providers route password resets through their strictest transactional stream. Bad news: when someone migrates auth providers (Auth0 to Clerk, Cognito to Supabase), this is the flow that gets forgotten and ends up routed through a half-configured SES sandbox account.

2. Payment receipts. A customer just paid you. They expect a receipt within a minute. If receipts defer for twelve hours, the customer assumes the payment failed and either pays again (refund headache) or charges back (worse). We audit the receipt flow first, every time, because it has the shortest "did anyone notice?" window.

3. Invoice reminders. The flow the client thought they were hiring us to automate. Counterintuitively, this is the flow that survives deferral best from a noticing perspective. A reminder going out twelve hours late doesn't change whether the customer pays today. It changes whether they pay at all, but the finance lead won't see that signal for thirty days, by which point three invoice cycles have drifted past due.

If a client's reminder flow is the only one routed through SES with a shaky reputation, we tell them to move it before we automate it. There is no point teaching an agent to write thoughtful, well-timed nudges that arrive on Tuesday evening when they were due Monday morning.

The actual checklist

Here's the audit distilled. You can run this against your own estate in an afternoon.

[ ] One SPF record per sending domain, under 10 lookups
[ ] DKIM signing on every transactional source (2048-bit, own selector)
[ ] DMARC published, rua= going to a monitored inbox
[ ] DMARC alignment passing on a sampled From:-domain test
[ ] Subdomain strategy documented (mail., billing., notify., etc.)
[ ] One ESP account per business purpose, not per developer
[ ] Bounce + complaint webhooks wired to a dashboard someone reads
[ ] Reputation snapshot from Google Postmaster Tools + Microsoft SNDS
[ ] Password reset flow tested end-to-end on an Outlook.com inbox
[ ] Receipt flow tested end-to-end on a Hotmail inbox
[ ] Reminder flow tested with a real customer's domain in the set

The last one matters most. Most internal "deliverability tests" send to Gmail and call it a day. Gmail is the easy receiver. Outlook is the hostile one. Hotmail, Live, MSN, the rest of the Microsoft estate. Test there or don't bother testing.

What we do with the results

When we built the invoice-chase agent for a Dutch construction-services client this spring, the audit caught a Mailgun pool issue: their receipt mail was deferring on a shared pool because an unrelated marketing tenant on the same account had been flagged the week before, so we moved receipts to Postmark, kept reminders on a warmed dedicated IP, and the chase agent's first run cleared a 71% open rate. Most of our email automation work starts here, at the DNS floor, before any agent gets written.

If you want to run this against your own estate today, pick the receipt flow. Send one from your live system to a fresh Outlook.com address, then open the raw headers and read the Authentication-Results line. If you see dmarc=fail or dmarc=none, you have an afternoon's work in front of you. The agent comes after.

Key takeaway

Outlook stopped being lenient. If DMARC alignment fails on more than 2% of your invoice reminders, you are gambling cashflow on receiver leniency.

FAQ

Does this audit apply if we only send a few hundred emails a week?

Yes. Microsoft's enforcement doesn't have a volume floor the way Google's bulk-sender rules do. A small sender with broken DMARC alignment still gets deferred, just less visibly than a large one.

Can we skip DMARC if our SPF and DKIM both pass?

No. Outlook's 2026 posture explicitly checks for a DMARC record. No DMARC counts as a soft fail and contributes to deferral decisions, even when SPF and DKIM individually pass.

Postmark or Mailgun for transactional invoice mail?

Postmark for receipts and password resets where deliverability has to be boring. Mailgun on a warmed dedicated IP for higher-volume reminder flows. Never mix the two on one stream.

How long does the full audit take?

About four hours for a clean estate, a full day for a tangled one. The mapping of which tool sends through which provider usually takes longer than the DNS work.

email automationautomationai agentsoperationsintegrationsstrategy

Building something?

Start a project