Security

OAuth audit before AI agents: the checklist we run first

A forgotten Microsoft Entra app secret leaked an architecture firm's Outlook attachments for eleven months. The audit we now run before any agent touches production.

Jacob Molkenboer· Founder · A Brand New Company· 10 Jun 2026· 9 min

Brass key on wax-sealed envelope, green tag, open leather logbook, unclasped padlock on linen.

The Entra app nobody owned

The pre-flight was meant to be a formality. A Dutch architecture firm had signed off on the inbox-triage agent, deployment was set for the Monday, and we were doing the security walk-through on the Friday afternoon. The ops lead opened the Entra admin portal, sorted enterprise apps by creation date, and started clicking through anything that looked unfamiliar.

Halfway down the list: Mail Export Test 2024. No description. The owner field was blank. One client secret, valid through 2027. One application permission: Mail.Read, tenant-wide, admin-consented. The audit log showed the app had been pulling messages and attachments from every mailbox in the tenant for eleven months. Quietly. At a steady rate. Nobody was getting alerted, because nobody had set up alerts on an app nobody knew existed.

The intern who registered it had left the previous October. The secret he generated had a two-year lifetime because that was the default when he clicked the button. The exports were going to a Cloudflare R2 bucket the firm had never owned.

We did not deploy the agent that Monday.

How agents make the OAuth graveyard worse

Most companies that ask us to build an AI agent have an OAuth and service-account inventory that nobody has looked at in years. The agent is not the cause of the problem. The agent is the trigger that finally forces somebody to look.

A few things change when an agent enters the picture. The agent usually asks for broader read scope than a human integration does. An inbox-triage agent needs to see threads it was not directly forwarded into. A calendar agent needs Calendars.ReadWrite, not just the meeting it was invited to. The agent's identity then sits in your audit log alongside every other OAuth app you have. If the previous fifty apps in your tenant are unowned, unrotated and under-monitored, your new agent inherits a security posture set by the worst of them.

The legal framing is moving too. Under the GDPR you are the data controller for what your agent does inside your tenant, regardless of who originally registered the app or which intern still has a stale client secret in a subscription nobody opens. If your agent reads the wrong mailbox or replies to the wrong customer, "the intern's secret leaked" stops being a defensible answer.

Warning

If your audit-log retention is shorter than the time it takes you to notice a leak, your audit log is theatre. Check this number before you check anything else.

The audit, top to bottom

We run eleven checks across the customer's identity providers and any service accounts in their cloud projects. The list below assumes a Microsoft 365 tenant and a single Google Cloud project. The shape ports directly to AWS IAM, Okta, Google Workspace and self-hosted OAuth servers.

1. Enumerate every app and every key

You cannot audit what you cannot list. The first artifact we produce is a single CSV with every Entra app registration and every enterprise app on one side, every Google Cloud service account and every Workspace OAuth client on the other.

Connect-MgGraph -Scopes 'Application.Read.All','AuditLog.Read.All','Directory.Read.All'

Get-MgApplication -All | ForEach-Object {
  $owners = (Get-MgApplicationOwner -ApplicationId $_.Id).AdditionalProperties.userPrincipalName
  [PSCustomObject]@{
    DisplayName    = $_.DisplayName
    AppId          = $_.AppId
    Owners         = ($owners -join ',')
    SecretCount    = $_.PasswordCredentials.Count
    NextExpiry     = ($_.PasswordCredentials.EndDateTime | Sort-Object | Select-Object -First 1)
    AppPermissions = ($_.RequiredResourceAccess.ResourceAccess.Id -join ',')
  }
} | Export-Csv -Path entra-apps.csv -NoTypeInformation

That CSV is the spine of the audit. Every other check decorates a row.

2. Map every identity to a current human owner

For each row in the CSV, name an employee who currently works at the company and can answer for the app. Not a team. Not a Slack channel. A person.

Rows with no owner are the rows that become Mail Export Test 2024. Rows where the owner left the company are functionally the same. In our experience these account for between twenty and forty percent of any tenant older than three years.

3. Shrink the scopes

For each app, list the scopes it currently has and the scopes it actually uses. The Microsoft Graph audit log will tell you which permissions an app exercised in the last thirty days; Google Cloud Audit Logs do the same for service accounts.

Most apps use one or two of the ten scopes they asked for. Replace the broad scopes (Mail.Read, Files.ReadWrite.All) with the narrowed equivalents (Mail.Read.Shared, Files.SelectedOperations.Selected). Microsoft's Graph permissions reference lists the granular alternatives for every common scope.

4. Cap secret and key lifetime

The Entra default for a new client secret is two years. The Google service-account JSON key default is forever. Both are wrong. Set a tenant policy that caps secret lifetime at ninety days for non-interactive apps and rotate on schedule, or move to managed identities and workload identity federation so there is no secret to rotate. Google's own key-management guidance spells out why long-lived JSON keys are a footgun.

gcloud iam service-accounts list --format='value(email)' \
  | while read sa; do
      gcloud iam service-accounts keys list \
        --iam-account="$sa" \
        --filter='keyType=USER_MANAGED' \
        --format='csv[no-heading](name,validAfterTime,validBeforeTime)' \
        | sed "s|^|$sa,|"
    done > gcp-sa-keys.csv

5. Restrict where credentials can be used from

Microsoft and Google both support binding a client credential to a set of IP ranges or to a workload identity. If your agent runs in a single Kubernetes namespace or a single Vercel deployment, the credentials it uses should not be valid from a coffee-shop laptop. Write the conditional access policy. Test that it actually blocks.

6. Verify audit log retention

Find out, today, how long your tenant keeps OAuth app audit events. Microsoft's default is short on lower tiers and longer on E5; Google Cloud's Admin Activity logs are free and kept four hundred days, while Data Access logs are off by default. If retention is shorter than your average time-to-notice, fix that before you fix anything else.

7. Set per-app rate limits

A legitimate inbox agent reads, on average, one to five threads per minute per user. A compromised one will try to mirror the entire mailbox. Most providers expose a rate limit on the app object. Set it. The cost of being slightly throttled during a burst is much smaller than the cost of being the conduit for an exfiltration spike.

8. Tag the agent identity

When the agent's identity is created, tag it so it is distinguishable from human OAuth flows in the audit log. We use a purpose=agent extension attribute on Entra service principals and a purpose: agent label on GCP service accounts. The audit query you write at 2am on a Saturday is much faster when "all agent activity in the last hour" is one filter.

9. Write the off-switch

Before deployment, the runbook that revokes the agent's credentials and disables its account must exist, be tested, and be runnable by somebody who is not the engineer who built the agent. Sixty seconds from "something is wrong" to "agent cannot read anything" is the target. We rehearse this in the pre-flight.

10. Turn on the anomaly alerts

Microsoft Defender for Cloud Apps and Google Workspace's Alert Center both ship rules for anomalous OAuth use. Turn them on. Pipe the alerts to a channel that is read on weekends. The reason Mail Export Test 2024 ran for eleven months is that nobody had wired the alert up.

11. Decommission the dead

Every audit produces a list of apps that should not exist. Apps tied to former employees. Secrets that have not been used in two years. Service accounts that no workload claims. Delete them. Not "disable", not "review later". Delete. If somebody screams, you have just found the actual owner, which is information you needed anyway.

What the report actually looks like

The output of the audit is a single Markdown document the client signs off on. Three columns of triage: keep (with the changes above), rotate (used but mis-scoped), kill (no owner, no traffic, or both).

For the architecture firm, the first audit produced 134 identities. Kill: 41. Rotate: 67. Keep as-is: 26. The Outlook leak was the worst find, but it was not the only one. Two service accounts in their billing GCP project had Owner role and JSON keys generated in 2021. Four Power Automate flows were running under the personal Microsoft account of a developer who had left two years earlier.

The agent went into production three weeks later, into a tenant that was, for the first time in a long time, mapped.

One thing to do today

Open your Entra portal (or your Google Cloud IAM page, or your Okta admin console) and sort by creation date, oldest first. Click into the ten oldest non-Microsoft entries. For each one, answer two questions: who at the company owns this, and when was the secret last rotated. If you cannot answer either for more than two of the ten, you have the same audit on your hands that the architecture firm did, and you should run it before the next agent goes anywhere near production.

When we built the inbox-agent for that firm, the lesson was that the agent itself was the easy part. Getting their tenant ready for any new automation took longer than building the AI agents on top of it, and that was the work that actually made the deployment safe.

Key takeaway

Before an AI agent touches production, audit every OAuth app and service account that was already there. The agent is rarely the weak link; the graveyard around it usually is.

FAQ

What's the single highest-priority finding to fix first?

Application-level Mail.Read (or its Google equivalent) granted tenant-wide to an app whose owner has left the company. That is a year of attachments waiting to be exfiltrated.

How long does the audit take in practice?

Half a day for a tenant with under fifty registrations and one cloud project. A week for an enterprise with ten years of OAuth sprawl. The export queries run in minutes; the human triage takes the time.

Can't we just trust the AI agent vendor's stated permissions list?

You can trust their stated scopes. You cannot trust the rest of the tenant. The agent is joining a pile of existing identities, and the pile is what gets compromised.

Does the same checklist work for Google Workspace and AWS?

Yes. List service accounts and OAuth clients, check key age, map to owners, cap lifetimes, restrict by workload. The Microsoft example is just where the worst leak we have seen happened.

What if the audit blocks our agent launch by weeks?

Then the audit did its job. Launching an agent on top of an unmapped tenant is how you turn a productivity project into an incident-response project. Three weeks of delay beats three years of attachments in someone else's bucket.

securityai agentsoperationsintegrationsarchitectureworkflow

Building something?

Start a project