Security

Microsoft 365 audit checklist: before you ship an inbox agent

Before we connect an agent to a tenant, we run a four-hour audit: twelve service principals, thirty shared mailboxes, one Graph token rotation, AVG trail intact.

Jacob Molkenboer· Founder · A Brand New Company· 18 Jun 2026· 9 min

Brass key with green ribbon on linen, wax-sealed envelope, leather ledger, rubber stamp and ink pad on ivory desk.

It is 09:00 on a Tuesday and the operations lead at a 180-person Dutch logistics firm is on a Teams call asking when we can ship the inbox-triage agent. The contract is signed. The mailboxes are mapped. The shared service account is provisioned. And we say: not yet. We need four hours with the tenant first.

Every Microsoft 365 retrofit we quote begins with the same audit. It is not optional and it is not a sales gate. It is the only way to know what the agent will actually be allowed to do, what it will leave behind in the audit log, and what will quietly break the first time a delegated token rotates. Below is the version of the checklist we run in mid-2026, after fourteen agents in production and one near-miss with a tenant that had thirty-one shared mailboxes with audit logging silently disabled.

Twelve service principals, scored by blast radius

Every tenant has hundreds of service principals. Most are dormant first-party apps Microsoft installed at provisioning. The twelve that matter are the ones with active sign-ins in the last thirty days and at least one tenant-wide Graph permission. We rank them by blast radius, defined as: if this app's client secret leaked at 02:00 on a Saturday, what could it read or send before anyone noticed on Monday?

Connect-MgGraph -Scopes "Application.Read.All","AuditLog.Read.All"

Get-MgServicePrincipal -All -Filter "servicePrincipalType eq 'Application'" |
  Where-Object { $_.SignInActivity.LastSignInDateTime -gt (Get-Date).AddDays(-30) } |
  ForEach-Object {
    $assignments = Get-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $_.Id
    [PSCustomObject]@{
      DisplayName = $_.DisplayName
      AppId       = $_.AppId
      Roles       = ($assignments.AppRoleId -join ',')
      LastSignIn  = $_.SignInActivity.LastSignInDateTime
    }
  } | Sort-Object LastSignIn -Descending | Select-Object -First 12 |
  Export-Csv ./sp-top12.csv -NoTypeInformation

Scoring is simple. A 1 means the app holds Mail.Read or User.Read.All: bad, but bounded. A 3 means Mail.ReadWrite or Calendars.ReadWrite.All. A 5 means the app can send mail on behalf of users or write to the whole directory: Mail.Send, Group.ReadWrite.All, Directory.ReadWrite.All. Anything scored 4 or higher with no conditional access policy attached gets flagged red. In eight of the last ten audits, at least two of the twelve scored 4 or higher with no policy at all.

Mailbox audit retention on the top thirty shared inboxes

Shared mailboxes are where the agent will live. They are also where Microsoft's defaults will quietly let you down. Mailbox audit logging has been on by default for user mailboxes since 2019, but shared mailboxes inherit different defaults, and tenants that migrated from on-prem Exchange or older E1 plans often have it off. The Microsoft documentation on mailbox audit logging is clear enough, but the defaults will not save you.

We pull the top thirty shared mailboxes by item count, then check three flags: AuditEnabled, AuditLogAgeLimit, and which operations are recorded for the Delegate logon type. The agent runs as Delegate. If MailItemsAccessed is not in the audit set for that logon type, you cannot reconstruct what the agent read after the fact. Under AVG that is the difference between an explainable disclosure and a notification to the Autoriteit Persoonsgegevens.

Get-EXOMailbox -RecipientTypeDetails SharedMailbox -ResultSize Unlimited |
  ForEach-Object {
    $stats = Get-EXOMailboxStatistics -Identity $_.UserPrincipalName
    [PSCustomObject]@{
      Mailbox          = $_.UserPrincipalName
      ItemCount        = $stats.ItemCount
      AuditEnabled     = $_.AuditEnabled
      AuditLogAgeLimit = $_.AuditLogAgeLimit
      DelegateOps      = ($_.AuditDelegate -join ',')
    }
  } | Sort-Object ItemCount -Descending | Select-Object -First 30 |
  Export-Csv ./mailbox-audit.csv -NoTypeInformation

The default AuditLogAgeLimit is 90 days. We push it to 365 on any mailbox the agent will touch, longer if the client has a fiscal retention obligation. Under the Algemene wet inzake rijksbelastingen that means seven years for anything that touches an invoice.

Conditional access gaps that bite agents specifically

Conditional access for workload identities went generally available in 2022, but four years in, the adoption rate inside our SME book is still under half. The most common gap is a tenant with a beautiful CA stack for human users (MFA, compliant device, named locations) and absolutely nothing scoped to service principals. The agent's secret is then just a string in a key vault that any compromised admin session can exfiltrate. The Microsoft reference on conditional access for workload identities is the right starting point if your tenant has none.

Three policies are mandatory before we ship. First, block the agent's app from any IP outside our deployment ranges and the client's office ranges; the agent has no business calling Graph from a coffee shop. Second, require token protection on any session that holds Mail.ReadWrite, so a stolen refresh token cannot be replayed from a different device. Third, block legacy authentication on the service principal explicitly. The tenant-wide policy that blocks legacy auth for human users does not cascade to workload identities. The two are scoped separately, and we have seen that catch out four tenants in a row.

The delegated Graph token rotation drill

This is the test that decides whether an agent ships on schedule or slips two weeks. We pick three departments, typically Sales, Finance, and Operations, and ask: if we rotate the delegated Graph token for the agent's service account at 09:00 on a Tuesday, which department's inbound mail trail breaks?

The answer should be "none". The actual answer, in the first audit we did this year, was "Finance, because the invoice-chase macro someone bolted onto Outlook in 2019 also held a delegated token under the same UPN, and it had been silently failing to write to the shared journal for eight months without anyone noticing."

The drill itself: rotate the token in a dev tenant clone first, then walk through each department's known automation surface and confirm every write to the audit-relevant trail still lands. The short version of our runbook: nothing that touches klant-correspondentie should depend on a single token whose rotation schedule no human owns.

AVG and the klant-correspondentie trail

The Algemene Verordening Gegevensbescherming is not subtle about audit trails for customer correspondence. If your agent reads, classifies, or replies to a mail from a klant, you must be able to reconstruct what it did and why, and the data subject can ask you for that reconstruction. Under article 30 you need a register of processing activities; under article 32 you need the technical and organisational measures to back it. The Autoriteit Persoonsgegevens has been consistent that "we lost the log" is not a defence.

Concretely: if the agent reads a customer mail and decides not to escalate, that decision is a processing event. The minimum we capture is timestamp, mailbox, message ID, agent decision, model version, and the policy version that produced the decision. We write it to an append-only log that lives outside the M365 tenant. A Postgres table with a write-once role works. An S3 bucket with object lock works better.

Warning

If the only audit trail of what your agent did lives inside the same tenant the agent has write access to, you do not have an audit trail. You have a suggestion.

The actual four-hour checklist

Here is the checklist in the order we run it. Each item produces a CSV or a JSON artefact that goes into the engagement folder; nothing gets checked off until the artefact exists.

Tenant inventory: licenses in force, MFA enforcement state, named global admins, break-glass accounts confirmed and tested.
Service principal census: top twelve by recent sign-in, scored 1 to 5 on blast radius.
Conditional access gap report scoped to workload identities specifically.
Top thirty shared mailbox audit posture, as the CSV above.
Token rotation dry-run in a dev tenant clone for three departments.
AVG processing register entry drafted for the agent in question.
Append-only decision log target provisioned and tested end-to-end.
Sign-off in writing from the client's DPO, or where there is none, the operations lead.

It takes four hours of senior consultant time if the client's admin grants Global Reader and Security Reader upfront. It takes two days of calendar time if access has to be scheduled. We will not quote agent work without it.

When we built the inbox-triage agent for a 90-person Dutch wholesale client last quarter, the thing we ran into was item five: the rotation drill broke their Finance department's invoice journal in a way nobody had touched since 2020. We rebuilt that journal as a small service that owns its own credentials and writes through a tested interface, and the agent shipped a week later instead of three months later, which is when the journal would otherwise have failed in production. That sort of pre-flight is most of what makes an AI agent survive contact with a real tenant.

The smallest thing you can do today: run the mailbox audit one-liner against your own tenant and count how many of your top thirty shared mailboxes have AuditEnabled set to true. If the number is under twenty-five, you have a project.

Key takeaway

Before any inbox agent touches a tenant, score twelve service principals, audit thirty shared mailboxes, and dry-run a Graph token rotation.

FAQ

Do I need to run this audit if my Microsoft 365 tenant is brand new?

Yes. Default settings on new tenants are tuned for user-friendliness, not for hosting an autonomous agent that reads customer mail. The first audit on a fresh tenant usually surfaces three to five gaps.

How long does the checklist take end-to-end?

Four hours of senior consultant time if the client's admin grants Global Reader and Security Reader upfront. Two days of calendar time if access has to be scheduled. We will not quote agent work without it.

Can the agent itself run the audit?

No. The audit needs read access to tenant-wide directory and security data that you should never grant to an automation account. A human admin runs it, in a session that is revoked when the audit is done.

What if the client's M365 admin is an external MSP?

Common at this size. We pull the MSP into the engagement and ask for read-only Global Reader plus Security Reader for the audit window. If they refuse, that is a finding in itself and goes into the report.

securityai agentsemail automationintegrationsoperationsautomation

Building something?

Start a project