Automation

Automation audit checklist for LLM steps touching PII

A Gmail trigger in n8n quietly fed 612 customer rows into a third-party HTTP node last month. Here is the audit we now run before any LLM step touches PII.

Jacob Molkenboer· Founder · A Brand New Company· 13 Jun 2026· 9 min

Ivory desk with sealed cream envelope, green wax dot, brass paperclip, carbon form, ink pad, wooden stamp.

It is a Tuesday at 14:00. An operations lead at a 40-person fintech opens her n8n dashboard after lunch and pulls up the overnight log. The "enrich new leads" workflow ran 612 times. Each run pushed the full contact card into an HTTP node that turned out to belong to a freelance tool the previous ops manager built two years earlier. The credentials still worked. The endpoint still answered 200 OK. The data is now in someone else's logs.

This is not a hypothetical. We see a version of it every few months on the audits we run for sub-€15M clients before they wire an LLM step into a workflow that touches a customer record. The pattern is always the same. A trigger with more OAuth scope than the workflow needs. A downstream HTTP node nobody remembers approving. An LLM step that turns "summarise this lead" into "ship the raw row to a third party for inference." The fix is boring. It is a checklist.

Why the LLM step is the worst place to find this

You can survive an over-scoped Gmail trigger if the data never leaves the workflow's memory. The moment an LLM step enters the picture, three things change.

First, the row is serialised into a prompt and sent to a model provider. Even with zero-retention contracts in place, that prompt passes through transit logs, queue infrastructure, and abuse-monitoring pipelines for a window measured in days. Second, the model's reply is treated as trusted text by the next node in the workflow. Prompt injection from the lead's own message body becomes a control-flow vulnerability, not a content problem. Third, the LLM step is usually the last thing bolted on to a workflow that has been running for months, which means nobody re-audits the upstream nodes when it lands.

The Open Worldwide Application Security Project lists sensitive information disclosure at LLM02 and excessive agency at LLM06 in its 2025 LLM Top 10. Both apply to no-code workflows even though most n8n and Make.com estates never get classified internally as "AI applications." The classification is what the auditors miss.

The trigger scope review

Start at the source. Every trigger in n8n, Make, and Zapier is backed by an OAuth scope or an API key with a permission set. Most operations teams accept the default scope at connect time and never revisit it.

Pull every connected app from the workflow estate and write down the actual scope granted, not the scope the docs claim is needed. For Gmail, the trigger "On new email matching search" requests gmail.readonly against the entire mailbox, not just the matched messages. For HubSpot, the default OAuth flow grants crm.objects.contacts.read across all properties, including the custom ones marked sensitive in the CRM UI. For Make's Microsoft 365 modules, the connection asks for Mail.Read across all folders, even when the scenario only reads from one shared mailbox.

Cut anything you do not need. If the workflow only reads emails sent to billing@, route them to that label first and use gmail.metadata plus a label filter. The Google docs cover the scope hierarchy in detail and the metadata scope is almost always enough for routing.

The PII inventory

For every workflow that will get an LLM step, list the fields that flow through it. Not "customer record." The actual columns: email, phone, full address, national ID, IBAN, free-text notes that may contain anything.

This sounds tedious. It takes 20 minutes per workflow. The output is a two-column table that gets pinned to the workflow description. Without it, the next step is impossible.

workflow: enrich-new-leads
trigger: gmail.message.new
fields_in_motion:
  - sender_email          # PII
  - sender_name           # PII
  - message_body          # may contain PII
  - thread_id             # internal
  - hubspot_contact_id    # internal
llm_step:
  receives: [sender_first_name, message_body_redacted]
  forbidden: [sender_email, thread_id, hubspot_contact_id]

The LLM boundary

Decide what the LLM step is actually allowed to see. The default in n8n's OpenAI node and Make's AI modules is to pipe the entire upstream JSON into the prompt. That is the wrong default for any workflow touching PII.

Insert a redaction node directly before the LLM step. In n8n that means a Function node. In Make it is a Tools > Set Multiple Variables module. The redaction reduces the payload to the minimum the model needs to do its job, and substitutes a correlation token for anything the downstream nodes need to look up later.

// n8n Function node, placed immediately before the OpenAI node
const item = $input.item.json;
const crypto = require('crypto');

// Hash anything we need to correlate later but the model should not see
const correlationId = crypto
  .createHash('sha256')
  .update(item.sender_email + process.env.WORKFLOW_SALT)
  .digest('hex')
  .slice(0, 12);

// Redact free text with a regex sweep, then forward only the minimum
const redactedBody = (item.message_body || '')
  .replace(/[\w.+-]+@[\w-]+\.[\w.-]+/g, '[EMAIL]')
  .replace(/\+?\d[\d\s().-]{7,}/g, '[PHONE]')
  .replace(/\b[A-Z]{2}\d{2}[A-Z0-9]{1,30}\b/g, '[IBAN]');

return {
  json: {
    correlation_id: correlationId,
    sender_first_name: (item.sender_name || '').split(' ')[0],
    body_for_model: redactedBody.slice(0, 2000),
  },
};

The correlation_id round-trips through the LLM and lets the downstream node look up the real record from your own database. The model never sees the email address. If a prompt injection lands, the worst it can do is corrupt the summary, not exfiltrate the row.

The HTTP node allowlist

Every HTTP Request node in the estate gets its destination hostname written down. Compare that list against the workflow's stated purpose. Anything pointing at a domain nobody can explain gets disabled, not deleted, while you trace its history.

The 612-row leak we opened with was an HTTP node pointing at a Heroku app deployed in 2023 by a contractor who has since left. The endpoint accepted whatever payload it received and never returned an error. Nobody noticed because the workflow's success metric was "200 OK." A simple denylist on the n8n instance, plus a quarterly review of HTTP destinations, would have surfaced it the same week the contractor left.

Warning

A 200 OK from an HTTP node is not proof of correctness. It is proof that something answered. If the workflow logs only the status code, you have no audit trail when an endpoint silently changes hands.

Credential hygiene

Open the credentials store in n8n or the connections view in Make. Sort by "last used." Anything older than 90 days that still has live credentials gets rotated or removed. Anything tied to a former employee's Google account gets revoked at the IDP, not at the workflow level. Revoking at the workflow level leaves the token alive everywhere else it has been copied.

For LLM provider keys specifically, use a separate key per workflow with a spend cap. Anthropic, OpenAI, and Google's AI Studio dashboards all support per-key budget limits. If a runaway loop starts iterating over your contact table at three in the morning, the cap stops the bleeding before the invoice does. The recent story of an AI agent that bankrupted its operator while scanning DN42 is a clean illustration of why an unbounded credential on a long-running workflow is its own category of risk, separate from data leakage.

Logging that lets you replay

The execution log needs to record three things for every PII-touching step: the correlation_id, the redacted payload sent to the LLM, and the raw response. The unredacted record never goes into the log.

In n8n this means turning on "Save data on success" with a payload size limit, then writing a separate audit row to your own Postgres that ties correlation_id back to the customer row. In Make.com it means using the History feature with the data-store-write module for the audit record. Either way, the principle is the same: the workflow's runtime store holds the redacted version, and a separate audited store holds the mapping back to the real row.

The reason this matters is the data-subject access request. When a customer writes in and asks "why did your system call me," you need to reconstruct the decision the model made without re-leaking the data. The correlation_id lets you replay against your own copy. Without it, you either show the customer their own raw row trapped inside a vendor log, or you tell them you cannot answer the question. Both are bad.

The kill switch

Every workflow that touches PII needs an unconditional disable path that does not require logging into the n8n UI. We do this with a row in a Postgres table the workflow checks on every execution. If the row says disabled = true, the workflow exits before any node runs.

CREATE TABLE workflow_kill_switch (
  workflow_id  text PRIMARY KEY,
  disabled     boolean NOT NULL DEFAULT false,
  disabled_at  timestamptz,
  disabled_by  text,
  reason       text
);

-- Disable from any psql session, including one triggered by PagerDuty
UPDATE workflow_kill_switch
   SET disabled    = true,
       disabled_at = now(),
       disabled_by = 'pagerduty-incident-9412',
       reason      = 'PII leak suspected on HTTP node 7'
 WHERE workflow_id = 'enrich-new-leads';

The first node in every workflow is a Postgres query against this table. The query takes 8 milliseconds. The peace of mind, when the on-call engineer can stop a leaking workflow from their phone at 02:00 without remembering an SSO password, is worth the latency.

What we did for one fintech client

When we built the lead-enrichment AI agent for a Dutch fintech earlier this year, the thing we ran into was an old Make.com scenario nobody had touched for fourteen months that still held a Gmail OAuth token with full readonly across the founder's personal inbox. We ended up rebuilding the upstream trigger on a service account scoped to a single shared label, redacting at the boundary, and putting every HTTP destination behind an allowlist the operations lead approves through a Slack workflow. The audit took two afternoons. The peace of mind has lasted.

The five-minute version

If you read this and do one thing today, open your n8n or Make estate, sort connected apps by date, and write down the OAuth scope of every trigger older than 90 days. Cross out anything the workflow does not actually need. The next leak is almost certainly hiding in that list.

Key takeaway

Before any LLM step touches a customer row, audit the trigger scope, redact at the boundary, allowlist HTTP destinations, and wire a kill switch.

FAQ

Does a zero-retention agreement with the model provider remove the need for redaction?

No. Zero-retention covers training and long-term storage. Prompts still pass through transit and abuse-monitoring systems for days. Redacting at the boundary keeps PII out of those windows entirely.

Why redact before the LLM step instead of after?

Once the prompt leaves the workflow, you no longer control where it sits or who reads it. Post-hoc redaction only helps your own logs. The point of redaction is to never send the row in the first place.

Is this checklist overkill for a small Zapier setup?

The trigger scope review and HTTP destination list take an hour for a small estate. Skip the Postgres kill switch and audit table if you have under five workflows. Keep the redaction node either way.

What counts as PII in a free-text email body?

Anything that identifies a person on its own or with one other field. Names, emails, phone numbers, addresses, account numbers, IBANs, national IDs. When in doubt, treat the entire body as PII and redact aggressively.

automationai agentssecurityworkflowprocess automationoperations

Building something?

Start a project