Automation

n8n audit checklist: what we score before a retrofit

Forty-three workflows, 1.4 million execution rows, the default encryption key still in the .env. The audit we run on every sub-€15M n8n install before quoting anything new.

Jacob Molkenboer· Founder · A Brand New Company· 23 Jan 2026· 8 min

Brass-edged clipboard with cream checklist and green tab, loupe, pencil stub, red wax seal on ivory paper desk.

It's a Tuesday in Eindhoven. The operations lead at a €9M B2B distributor has handed us SSH access to a t3.medium running n8n behind Caddy. Forty-three active workflows, most authored by a developer who left last spring. The N8N_ENCRYPTION_KEY in the .env is the one printed in the quick-start guide three years ago. The execution_entity table holds 1.4 million rows. They want to add an agent that drafts replies to inbound RFQs. Before we quote that work, we audit what's already there.

We've run this audit on more than sixty Dutch SME n8n instances in the past eighteen months. The pattern is consistent enough that we wrote a checklist. The score isn't there to grade the operator — it's a forecast of what breaks the moment we layer an agent on top.

The encryption-key score

N8N_ENCRYPTION_KEY encrypts the data column on the credentials_entity table. If a Postgres dump leaks and that key sits next to it in source control, every Exact Online token, every Moneybird API key, every Microsoft Graph refresh token in your workflows is plaintext.

We score four states. Zero points if the key is the one n8n generated on first boot and was never changed. One point if it was rotated once, ever. Two points if it was rotated in the last twelve months. Three points if it's rotated and pinned in a secrets manager — Doppler, 1Password, Bitwarden Secrets Manager, AWS Secrets Manager, doesn't matter which.

n8n has no first-class rotation command. The procedure is mechanical: export decrypted, change the key, re-import.

# 1. Export every credential, decrypted, with the current key
docker exec -it n8n n8n export:credentials --all --decrypted \
  --output=/tmp/creds.json

# 2. Stop n8n, replace N8N_ENCRYPTION_KEY in the env, start again
#    The DB now holds credentials the new key cannot read

# 3. Re-import — n8n encrypts with the new key on write
docker exec -it n8n n8n import:credentials --input=/tmp/creds.json

# 4. Shred the plaintext before anyone notices it existed
shred -u /tmp/creds.json

We run this from a one-shot container, not from the host, and we route the export through a tmpfs mount so the plaintext never touches a real disk. On a forty-credential instance the whole rotation takes under two minutes of downtime if you script it.

Execution history against the AVG thirty-day line

The AVG — the Dutch implementation of the GDPR — doesn't name thirty days as a hard ceiling for n8n execution logs. Article 5(1)(e) does require that personal data be kept no longer than necessary. For inbound webhook payloads from a Shopify store, a Mollie checkout, or a Formidable form, we apply thirty days as the default unless the client can argue a longer retention need in writing. Most can't.

Default n8n behaviour is the problem. Older self-hosted versions kept executions forever. Recent versions prune at 336 hours, but only if you set the flag. And even with pruning enabled, the historical rows that were already in execution_entity when you turned the flag on don't go anywhere.

The config we ship looks like this:

EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=720          # hours, = 30 days
EXECUTIONS_DATA_PRUNE_MAX_COUNT=50000
EXECUTIONS_DATA_HARD_DELETE_BUFFER=24
N8N_DEFAULT_BINARY_DATA_MODE=s3

The binary data line matters more than people think. By default n8n writes webhook attachments — invoice PDFs, KvK extracts, scans of ID — into /home/node/.n8n/binaryData on the container filesystem. That directory is not covered by execution pruning. It grows until the disk fills, and on a backup it ends up in cold storage with a retention policy nobody wrote down. Move it to S3 with a lifecycle rule, or to a Hetzner Object Storage bucket with the same.

For the historical backlog, a one-shot SQL prune is faster than waiting for the pruner to catch up:

DELETE FROM execution_entity
WHERE "stoppedAt" < NOW() - INTERVAL '30 days';

DELETE FROM execution_data
WHERE "executionId" NOT IN (SELECT id FROM execution_entity);

VACUUM FULL execution_entity;

The second statement matters: n8n splits execution metadata and payload across two tables, and the foreign key is not enforced with a cascade.

The three workflows that survive a queue-mode cutover

Queue mode is the prerequisite for everything we want to add on top. The main process accepts webhooks and pushes jobs into Redis via Bull; one or more worker containers pick them up and run them. Without it, you cannot scale an agent that fans out twenty parallel LLM calls — the single main process blocks every other webhook until it finishes. The official walkthrough is in the n8n queue-mode docs, and the corresponding env vars are catalogued in the configuration reference.

The cutover is where in-flight verkooporder webhooks go missing. The default Bull retry policy will re-enqueue a failed job up to three times. If your downstream ERP isn't idempotent, you create three sales orders in Exact for one Shopify event. The fix is to add an idempotency key into your workflow before you flip EXECUTIONS_MODE.

Warning

If a worker dies mid-execution, the job is requeued from the start, not resumed. Anything with non-idempotent side effects between the trigger and the first await point will fire twice. Audit those workflows before, not after.

Out of a typical forty-workflow instance, the three patterns that survive the cutover with zero rework are predictable:

Stateless inbound webhook → external API → 200. A Shopify orders/create webhook that maps the payload to an Exact Online sales order, posts it, returns 200 with the new order ID. No Wait node, no local filesystem, no instance state.
Cron-triggered DB-to-DB transforms. Every fifteen minutes, read new rows from a Postgres source, transform, write to a Postgres destination. The worker has no state the next worker doesn't also have.
Manually-triggered async backfills. The 'run once' workflows the operations lead fires from the UI to reprocess a day of orders. Nothing downstream is waiting on the HTTP response, so a slow handoff is invisible.

Those three are the canary set. We migrate them first, watch for seventy-two hours, then move the next batch in groups of five. The workflows that need rework before they can move are also predictable: anything with a Wait node in resume-on-webhook mode, anything reading or writing the local filesystem outside of binary data mode, anything that uses $env for runtime config the workers don't have, and anything that responds synchronously to a webhook with a body that takes more than ten seconds to compute.

The full sheet

The encryption key, the retention setting, and the queue-mode readiness are three of eighteen items on the sheet. The other fifteen, in the order we score them:

Postgres backup frequency, retention, and the date of the last restore test
SSO or 2FA on the n8n UI — LDAP, SAML, or at minimum a TOTP enforcement
Reverse-proxy TLS: auto-renewal, modern ciphers, HSTS
Egress allowlist on the n8n host so a leaky workflow can't POST anywhere
Hardcoded secrets in workflow JSON — we grep the export for sk_live_, Bearer , and common shapes
Webhook authentication: header check, HMAC signature, or nothing
Workflow versioning via Git sync or a documented manual-export cadence
Error workflow configured and routing to a human channel
Node and n8n versions pinned, not :latest
Current n8n version against the published security advisories
Container CPU and memory limits set
Log shipping to somewhere that survives a container restart
PII inventory: which workflows ever touch a BSN, IBAN, or full name
DPA in place with every external processor the workflows call
A written DR runbook that someone other than the original author can follow

Each item scores 0–3. The maximum is 54. Anything under 30 means we quote a hardening sprint before any agent work begins, not after. Above 40, we go straight to the retrofit. Between 30 and 40 we negotiate which items move into the project scope and which stay on the backlog.

What to do today

Open your n8n instance, run n8n export:workflow --all --output=audit.json, and grep the file for sk_live_, Bearer , and any string that looks like an API token. That's a five-minute job that consistently surfaces between zero and four hardcoded credentials, and it tells you whether you start from a 0 or a 3 on the encryption-key item.

When we built the order-router for a B2B distributor in Brabant last spring, the queue-mode cutover broke exactly one workflow we hadn't flagged: a Wait node mid-flow that resumed on a callback from a freight carrier, and that depended on a piece of instance memory that didn't survive the handoff to a worker. We caught it in the canary window (which is why there's a canary window), and the full audit script lives inside our process automation playbook.

Key takeaway

Before adding agents on top of n8n, audit three things: the encryption key, execution retention against AVG, and which webhooks survive a queue-mode cutover.

FAQ

What is queue mode in n8n and why does it matter for agent work?

Queue mode separates the main process that receives webhooks from the worker processes that execute them, using Redis (Bull) as the job queue. It's the prerequisite for horizontal scaling and for any agent that fans out parallel LLM calls.

How often should I rotate the N8N_ENCRYPTION_KEY?

At minimum after any team member with access leaves. We score full marks for rotation in the last twelve months combined with the key being held in a secrets manager rather than a plain .env file.

Does the AVG require a 30-day execution log retention?

No, the AVG doesn't name a number. Article 5(1)(e) requires data minimization, so retention must be justified by purpose. Thirty days is the default we apply unless the client documents a longer need in writing.

Which workflows break when n8n switches to queue mode?

Workflows with Wait nodes in resume-on-webhook mode, anything writing to the local filesystem outside binary data mode, anything reading $env for runtime config, and any webhook that takes more than ten seconds to respond synchronously.

automationprocess automationworkflowsecurityoperationsarchitecture

Building something?

Start a project