Email automation
Email automation case study: legal intake, 11 days to 36 hours
An intake paralegal at a 41-person Utrecht law firm had 137 unread messages and a partner asking about a contract she had sent nine days ago. Here is how we fixed it.

It was a Thursday afternoon in late February. The intake paralegal at a 41-person Utrecht law firm had 137 unread messages in the shared mailbox and a partner on Teams asking about a contract she had sent in nine days ago. The contract was in the queue. It had been read, classified, conflict-checked. It just had not been routed to a partner yet. The mailbox was the bottleneck, and everyone in the firm knew it.
The 11-day queue
The firm specialises in commercial disputes and mid-market M&A. Every contract intake from a new prospect lands in one shared mailbox at intake@[firm].nl. From there, a single paralegal reads the message, opens the attachment, runs a conflict check against the firm's matter database, decides which practice group it belongs to, writes a one-paragraph summary, and forwards it to the right partner.
When we measured the queue in January 2026, the median time from "email lands in inbox" to "partner has it on their desk" was 11.2 days. The 95th percentile was 19 days. Two of the firm's biggest deals in the last twelve months almost died waiting in that queue. One did die.
The cause was not the paralegal. She was excellent. The cause was that one person was doing seven steps on behalf of forty-one people, and the seven steps had no shared state.
Why we did not buy a SaaS
The firm had already trialled two off-the-shelf legal intake products in 2025. Both were rejected by the partners for the same three reasons.
The first product's classifier was a black box. Partners could not see why a contract had been tagged "real estate" instead of "corporate". When the classifier was wrong, nobody at the firm could inspect or fix it.
The second product's audit trail lived inside the vendor's database. The firm's compliance officer wanted every routing decision in their own systems, on their own backups, queryable with their own SQL. "In someone else's cloud" was a non-starter.
Both products also forced the prospect to fill out an intake wizard before a human at the firm would respond. Prospects hated the wizards. The firm lost at least one engagement to a competitor who simply published an email address. We were brought in to rebuild the same workflow inside the firm's own Microsoft 365 tenant, with the firm's own Postgres database, and with the prospect still emailing a normal address.
The Microsoft Graph subscription
The firm runs on Microsoft 365. The first technical decision was: poll the mailbox, or subscribe?
Polling is the obvious thing. Run a job every minute, list messages since the last cursor, process them. It works. It is also wasteful, and it gives you a one-minute floor on intake latency.
Microsoft Graph supports webhook-style change notifications on a mailbox. You register a subscription, Graph posts to your endpoint when a new message arrives, and you fetch the message by ID. End-to-end latency in production is around two seconds. The Microsoft Graph change notifications documentation covers the validation handshake; the single most common reason teams give up on Graph webhooks is failing to respond to the initial validation token within ten seconds, so make that endpoint synchronous and fast.
import { Client } from '@microsoft/microsoft-graph-client'
const graph = Client.initWithMiddleware({ authProvider })
const subscription = await graph.api('/subscriptions').post({
changeType: 'created',
notificationUrl: 'https://intake.firm.nl/graph/webhook',
lifecycleNotificationUrl: 'https://intake.firm.nl/graph/lifecycle',
resource: "users/intake@firm.nl/mailFolders('Inbox')/messages",
expirationDateTime: new Date(Date.now() + 1000 * 60 * 60 * 24 * 2).toISOString(),
clientState: process.env.GRAPH_CLIENT_STATE,
})
Mailbox subscriptions max out at about three days, so we renew on a twelve-hour cron. The clientState is a shared secret we verify on every inbound webhook, because the notification URL is reachable from the public internet.
The classifier and the confidence floor
Once a message arrives, we run it through a classifier that produces three labels with a confidence score: a practice group (corporate, real estate, employment, litigation, IP, other), a client type (new prospect, existing client, opposing party, noise), and an urgency level (standard, time-sensitive, after-hours).
The classifier is a small language model fine-tuned on eighteen months of the firm's historical intake mail, with the partners' actual routing decisions as labels. We did not put a general-purpose chat model in the loop. It was overkill for the task, slower per call, and the partners did not trust it. A fine-tuned classifier on labelled in-domain data is the boring, correct answer.
The interesting decision was the confidence threshold. The model is right about 94% of the time on practice group. If you route every message automatically, you get one wrong routing per twenty contracts. That is unacceptable in a litigation practice, where a misrouted contract can blow a conflict check before the firm even knows it exists.
So the classifier does not route. It proposes. Every classification with a confidence below 0.85 lands on the review board. About 30% of intake hits the board on a normal day. The paralegal accepts or corrects in one click. The corrections feed back into the next fine-tune.
The classifier never decides. It proposes. A human accepts the routing, and the audit trail records who accepted what and when.
The Postgres review board
The review board is a single Postgres table and a small Next.js page. That is the whole thing.
create table intake_review (
id uuid primary key default gen_random_uuid(),
graph_message_id text not null unique,
received_at timestamptz not null,
from_address text not null,
subject text not null,
body_preview text not null,
attachment_count int not null default 0,
proposed_practice_group text not null,
proposed_client_type text not null,
proposed_urgency text not null,
confidence numeric(4,3) not null,
status text not null default 'pending'
check (status in ('pending', 'accepted', 'corrected', 'rejected')),
final_practice_group text,
final_partner_id uuid references partners(id),
reviewed_by uuid references staff(id),
reviewed_at timestamptz,
routed_at timestamptz
);
create index intake_review_pending_idx
on intake_review (received_at)
where status = 'pending';
Every classification proposal becomes a row. The partial index on pending rows keeps the board page fast even when the table has a hundred thousand history rows.
Postgres also runs the realtime layer. When a row hits pending, a trigger fires NOTIFY intake_review_new. The Next.js page holds an open LISTEN connection and pushes the new row through a websocket. The whole "realtime dashboard" is about forty lines of code. We were tempted to add Redis or a queue. We did not need to. The Postgres LISTEN/NOTIFY primitive has been doing this job since 2001 and it still does it well.
The audit trail the compliance officer asked for
Every state transition on a row writes a record to intake_review_audit: the message ID, the previous state, the new state, the actor, the reason. Every classifier proposal, every paralegal correction, every partner reassignment.
We learned this the hard way. The first version of the system stored only the final state. Two weeks in, a managing partner asked why a real estate contract had been routed to litigation. We had no answer. The classifier had been correct, the paralegal had overridden it, and we had not stored the override reason. We shipped the audit table that week. The compliance officer now answers "why did this contract end up with this partner" with a single SQL query, which is the entire point.
What changed in 36 hours
After eight weeks in production, the median time from "email arrives" to "partner has it on their desk" dropped from 11.2 days to 36 hours. The 95th percentile dropped from 19 days to 4 days. Paralegal hours on intake triage went from roughly 22 hours per week to 6. Misrouted contracts went from 4 in the previous quarter to 1 in the eight weeks since launch, and that one was a corner case the classifier had never seen in training.
The 36-hour median is not a technical limit. The system itself routes in seconds. The 36 hours is the firm deciding that every classification should be human-reviewed during business hours, regardless of confidence. They could push the median down to two hours by auto-routing high-confidence intake outside business hours. They chose not to. Lawyers, predictably, prefer the human in the loop.
That preference is not just lawyer-brain. People who build systems for a living tend to distrust systems they cannot inspect. The classifier this firm trusts is the one whose decisions they can review, override, and query in SQL. That is also the classifier that survived the partner meeting and the compliance review.
Three things we would do differently
Start with the review board, not the classifier
We spent the first two weeks tuning the classifier and built the review board last. That was backwards. The review board would have been useful even with a zero-percent classifier, because the paralegal was already triaging by hand. We should have shipped the empty board in week one and let the classifier earn its place against a working baseline.
Treat the Graph subscription as a fragile dependency
Microsoft renews subscription tokens silently most of the time and then occasionally does not. The lifecycleNotificationUrl is not optional. When Graph tells you the subscription is about to expire or has been removed, you must re-subscribe immediately and alert a human if it fails. We had one Saturday in March where we missed eleven hours of intake because we had treated the lifecycle endpoint as a nice-to-have. The fix is fifteen lines. Write them on day one.
Never put the prospect in a form
The firm's biggest non-technical win was that nothing changed for the prospect. They still send an email to a human-readable address. They still get a reply from a real partner within hours. The classifier and the board are invisible to them. Every legal intake SaaS we looked at tried to insert a form between the prospect and the firm. That is the wrong place to put friction. Put the friction inside your team's workflow, where you can iterate on it, not in front of the customer.
The five-minute thing you can do today
When we built the intake agent for this firm, the thing we kept hitting was the gap between what a classifier could do alone and what a partner would accept. We ended up solving it with a confidence floor, a Postgres review board, and an audit table that put the human first and made every decision queryable. If your team is sitting in a similar queue, the smallest useful thing you could do today is open the shared mailbox, count the messages older than 72 hours, and write down what each of them is waiting for. That spreadsheet is the spec for the AI agent you eventually build.
Key takeaway
The classifier never decides. It proposes. A human accepts the routing, and the audit trail records who accepted what and when.
FAQ
Why Microsoft Graph instead of IMAP?
Graph offers webhook-style change notifications with roughly two-second latency, so you do not poll. It also fits the firm's existing Microsoft 365 admin and consent model, which IMAP credential handling does not.
Why Postgres instead of a queue or pub/sub broker?
The review board is fundamentally state, not events. Postgres LISTEN/NOTIFY gives realtime updates without a separate broker, and the same database holds the audit trail the compliance officer needed.
How did you build the audit trail?
Every state transition writes a row to a separate audit table with the message ID, previous state, new state, actor, timestamp, and reason. The compliance officer queries it directly with SQL whenever a routing decision is challenged.
Could the classifier route automatically above a confidence threshold?
Technically yes. The firm chose not to. Every classification is human-reviewed during business hours, which is why the median is 36 hours rather than minutes. They prefer a slightly slower system they can fully audit.