Security

MCP server audits: a four-axis checklist for small teams

Every sub-€30M client we audit has the same MCP problem: nobody knows who installed what, which scopes those connectors hold, or how to revoke them in a hurry.

Jacob Molkenboer· Founder · A Brand New Company· 9 Jul 2025· 9 min

Brass ring of antique keys, leather ledger with green wax seal, folded card, open padlock, red ribbon on ivory desk.

Last Tuesday a COO we work with installed an MCP server on her MacBook. She had read a useful thread about pulling Notion pages straight into her assistant, copy-pasted the JSON config, and within four minutes a connector she had never heard of held a read-write OAuth token for the company's entire knowledge base, plus an offline-refresh scope that survives password rotation.

She is not the problem. She did exactly what the documentation told her to do. The problem is that her laptop, on a normal workday, is now a node in an agentic system nobody at her company can name, audit, or revoke. Multiply that across her ops team, finance team, and two co-founders, and you have the MCP estate of a typical sub-€30M company in 2026.

After the recent wave of supply chain incidents in agent tooling (browser extensions quietly exfiltrating credentials from local vault apps, and marketplace connectors shipping unsigned binaries under permissive names), we wrote a four-axis checklist that every retainer client now gets walked through twice a year. None of it is exotic. It is the kind of thing your insurance broker would ask if your insurance broker knew what MCP was.

Why MCP audits sit outside your existing posture

If you already have a SOC2 report, an MDM, or even just a Notion page called "Security", none of those cover this. MDM tracks what's installed at the OS level; it cannot see a JSON file living in ~/Library/Application Support/Claude/claude_desktop_config.json. SSO covers SaaS logins; it cannot see a personal OAuth token that a connector minted on the side. SOC2 covers the apps in scope at audit time, which is by definition not the new agentic surface.

The Model Context Protocol specification is also deliberately permissive. It defines how hosts, clients, and servers talk; it intentionally leaves trust, identity, and provenance to the host. That is the right design choice. It also means the work of bounding blast radius lands entirely on you.

Axis 1: tool-call provenance

For every MCP server installed on a company-managed device, you should be able to answer three questions inside five minutes.

Which human (or agent acting on a human's behalf) triggered the last 100 tool calls?
Which physical binary, at which version, is responding to those calls?
Where do those calls and their arguments end up at rest, and for how long?

"My laptop" is not an answer to question two. Neither is "the latest version of mcp-server-postgres". You want a pinned version, a checksum, and a path. We score this 0 to 3.

0  no logging; cannot enumerate installed servers
1  can list servers; no per-call log
2  per-call log, local only, no integrity guarantees
3  per-call log, signed binary, shipped to a SIEM or append-only file

Sub-€30M companies almost always score 0 or 1 on first audit. Getting to 2 takes about a day per laptop fleet, mostly because you have to teach people that command: "uvx" in a config file is not actually a name. It is a downloader.

Axis 2: per-tool rate limits

The first time an agent loop goes wrong, it goes wrong fast. We have seen a misconfigured planning loop send roughly 14,000 calls to a Linear workspace in an afternoon, blow through the customer's monthly quota by 11am, and CC the entire support inbox on a workflow that should have been internal. The agent was doing exactly what it was told. The tool layer had no governor.

Your audit needs to ask, for each connector:

What is the upstream API's rate limit on the OAuth token in use?
What is the per-tool, per-minute cap inside the MCP server itself?
What happens at the cap? Hard fail, soft degrade, or silent drop?

Most off-the-shelf MCP servers answer "yes, no, no idea" to that triple. That is a fix you can ship in an afternoon by wrapping the server in a thin proxy that counts calls per tool per minute and returns a structured error past the threshold. If you cannot fit the proxy in, at minimum write the threshold into the system prompt and add a hard call counter in the host. The agent will not enforce a limit it cannot see.

One thing worth saying out loud: an OAuth token's rate limit is a property of the token, not the tool. One agent loop that hits a rate-limited endpoint will starve every other workflow using the same token, including the ones a human is actively depending on.

Axis 3: OAuth scope blast radius

This is the axis that scares insurers, and rightly so. Most MCP servers request far broader OAuth scopes than the actual tools expose. A connector that lets an agent read calendar events frequently holds a token with calendar.events.write, contacts.readonly, and the offline refresh scope. The agent will never use most of that. A leaked token will.

The audit walks the OAuth grants for every active connector and answers three questions.

What scopes does the token hold?
Of those, which are actually exercised by tools the host exposes?
What is the worst-case action an attacker could take with that token if they extracted it now?

For the underlying mechanics, the OAuth working group's scope guidance is short and worth re-reading once a year. The relevant rule for our world: scope down at the identity provider, not at the client. If the connector demands a broader scope than you want to grant, that is a fork-the-connector situation, not a "we will be careful" situation.

0  full-account scope, offline refresh, no expiry
1  broad scope, refresh, 90-day expiry
2  narrow scope, refresh, 30-day expiry
3  narrow scope, no refresh, short-lived (24h), per-user

Most popular connectors today ship at 0 or 1. Bumping to 2 is a one-evening job for any halfway competent backend developer. Bumping to 3 is harder, because some upstream identity providers do not support short-lived per-user tokens cleanly, but it is worth the work for anything that touches money, customer data, or production infrastructure.

Axis 4: the install whitelist for non-engineers

This is the one the COO actually cares about. She is not going to read your threat model. She wants to know: can I install this thing from this Twitter thread, yes or no?

Our default answer is a short list that fits on an index card. We give the non-engineer a green list (anything signed, reproducible, and read-only against an account the company already controls) and a hard rule: nothing outside the green list without a Slack message to ops. We have learned not to bother with "ask first if unsure". The people you most want to ask first are the people most confident they do not need to.

The green list at a typical client looks like this:

Read-only filesystem servers pointed at one specific folder under ~/work/
Read-only document connectors (Notion, Drive, Confluence) bound to a non-admin account
Calendar read for the user's own calendar, no write
Anything signed by the host vendor and listed in the official connector registry

The amber list (allowed with a one-paragraph justification and a 30-day review):

Anything that mints or sends email
Anything that holds payment credentials, including read-only Stripe
Anything that reads a production database, even read-only
Anything that runs shell commands locally, full stop

The red list (no, period):

Connectors installed from a paste-from-thread that nobody at the company has heard of
Anything that asks for the broadest scope variant of an OAuth grant
Anything where the binary lives on a personal GitHub account with under 200 stars and no signing

The point of the index card is not to be comprehensive. The point is to give a non-engineer COO a one-second decision. Green? Install it. Amber or red? Ping ops.

Takeaway

The MCP estate at a small company is governed by the busiest person's tolerance for ambiguity. Give them a green list and they will use it. Give them a 12-page policy and they will install whatever they wanted to anyway.

The score, and what to do with it

We total the four axes out of 12. Anything under 6 is a this-week remediation: at minimum pin versions, narrow at least one scope, and ship the index card. A score of 6 to 9 is a this-quarter plan with a real review at the end. Above 9 is rare, and usually means we are auditing a client who already lost a token once and learned the lesson without us.

The score is not the point. The point is that next time the COO installs something on a Tuesday afternoon, the answer to "what just changed in our attack surface" takes minutes, not weeks.

When we audited the MCP estate at a Rotterdam logistics client this spring, the surprise was not the bad scopes. The surprise was that two connectors had been installed by an ex-employee months earlier, were still active, and held refresh tokens that had survived the offboarding cleanly. We solved it by wrapping the host in a thin governor that revokes every token without a current SSO-linked owner, and folded that step into our standard AI agents setup. The whole fix took less than a day.

If you want to start today, list every MCP server installed on every laptop in your company. Just the list. Not the scopes, not the versions, not the rate limits. If that list takes you more than thirty minutes to assemble, you already know what your first axis-1 score will be.

Key takeaway

Score every MCP server in your company on provenance, rate limits, OAuth scope, and install policy. Anything under 6 out of 12 is a this-week fix, not a quarterly one.

FAQ

Do we need an MCP audit if we already have SOC2?

Yes. SOC2 covers the apps in scope at audit time. MCP servers installed on laptops and the OAuth tokens they hold are almost certainly out of scope, and they hold real production access.

What is the fastest first step for a small ops team?

Assemble a single list of every MCP server installed across every company laptop. No scopes, no versions, just names and owners. If that takes more than thirty minutes, you already have your audit finding.

Can a non-engineer COO actually run this checklist?

She can run axis 4 (the install whitelist) on her own. Axes 1 through 3 need someone who can read a JSON config and an OAuth scope string. That is usually one afternoon of a backend developer's time.

How often should we re-run the audit?

Twice a year for the full four-axis pass. The install list (axis 4) wants a monthly five-minute spot check, because that is the surface that changes between formal audits.

securityai agentsintegrationsoperationstoolingautomation

Building something?

Start a project