← Blog

Email automation

Email agent on Microsoft Graph: replacing an Outlook rule maze

A 38-person Antwerp law firm asked us to retire 312 Outlook rules without touching their iManage DMS. Here is the Microsoft Graph agent we shipped.

Jacob Molkenboer· Founder · A Brand New Company· 7 Jun 2026· 10 min
Manila envelope with green wax seal beside a brass mail rack holding three cream index cards and a red rubber stamp.

It is a Tuesday morning in February. Marleen, the practice manager at a 38-person law firm in Antwerp, opens her laptop. Her inbox has 1,118 unread. She is not behind. She is buried.

The firm runs on Outlook, Exchange Online, and iManage Work. Over a decade, the partners and paralegals have built up 312 client-side Outlook rules between them. Some flag asbestos referrals as urgent. Some move bailiff confirmations to a folder no one reads. One rule, written by a senior partner who has since left, silently deletes anything from a specific opposing counsel.

None of this is documented. Half of it does not work. New hires cannot triage their own inbox without asking a paralegal which subfolder a debt-collection writ belongs in.

This is the email the practice manager sent us. "We do not want a new DMS. We do not want a new email client. We want the rules to stop being the bottleneck."

The scope: email only, DMS untouched

Before we wrote a line of code, we drew a line. iManage Work stays. The folder structure inside iManage stays. The matter numbering stays. We do not write to the DMS. We do not read from the DMS. The agent lives one layer up, in the inbox, and hands off cleanly when a human picks up a message.

This constraint is the whole reason the project shipped in six weeks instead of six months. Every legal-tech RFP we have seen tries to swallow the DMS, the time-tracking, the billing, and the conflict checks in one bite. The firm has good tools already. They have a bad email layer.

Takeaway

A 90-day email agent that respects the existing DMS will outperform a 12-month "AI legal platform" that tries to replace it.

Microsoft Graph as the substrate

The firm runs Microsoft 365. That means Microsoft Graph is the right substrate. Not IMAP. Not an SMTP relay. Not a desktop Outlook add-in. Graph gives us three things we need: subscribe to inbox changes via webhooks, read and modify messages without touching the user's client, and create reply drafts that land in the user's Drafts folder for human review.

We registered a single-tenant Entra application with one delegated scope (Mail.ReadWrite) and one application scope (also Mail.ReadWrite, scoped to the 38 pilot mailboxes with an Application Access Policy). The application policy is the part most teams skip. Without it, a Graph token issued to your app can read every mailbox in the tenant. The CISO will, correctly, kill the project on the spot.

# Restrict the app's Mail.ReadWrite scope to one mail-enabled group
New-ApplicationAccessPolicy `
  -AppId 4f1b3c... `
  -PolicyScopeGroupId email-agent-pilot@firm.be `
  -AccessRight RestrictAccess `
  -Description "ABN email agent, pilot mailboxes only"

Subscribing to inbox changes

Graph change notifications are how the agent learns a new message has arrived. We POST a subscription per mailbox, renew it well before expiry, and validate every incoming notification against a client state secret. The current maximum lifetime for mail resource subscriptions is 4230 minutes, documented in the Graph subscription reference. We leave a six-hour margin and renew on a cron.

POST https://graph.microsoft.com/v1.0/subscriptions
Content-Type: application/json

{
  "changeType": "created",
  "notificationUrl": "https://agent.firm.be/graph/notifications",
  "resource": "users/{userId}/mailFolders('Inbox')/messages",
  "expirationDateTime": "2026-06-10T08:00:00Z",
  "clientState": "redacted-256-bit-secret",
  "includeResourceData": false
}

We deliberately do not request resource data in the notification payload. That feature requires asymmetric encryption setup, and it gives us no benefit here. When a notification arrives, we fetch the message body with a fresh Graph call. Two round trips, both inside one tenant, both fast.

The 14-matter classifier

The firm handles 14 matter types: personal injury, employment, family law, immigration, commercial contracts, real estate, bankruptcy, intellectual property, criminal defense, tax, GDPR complaints, asbestos and industrial disease, medical malpractice, and debt collection. Each has its own intake checklist, its own urgency rules, and its own paralegal pool.

The classifier is one model call per message. We use a small reasoning model for classification (the cost difference matters when you process 400 messages a day) and a larger one for the draft-reply step. The classification prompt is built from three pieces: the message subject and first 800 characters of body, the sender's known role if we have one (client, opposing counsel, court clerk, internal), and the 14 matter definitions written out as one paragraph each.

The output is a strict JSON object. We validate it before any routing logic runs.

{
  "matter_type": "asbestos",
  "confidence": 0.91,
  "urgency": "high",
  "sender_role": "client",
  "action_hint": "intake_questionnaire",
  "matter_id_guess": "MAT-2024-0837",
  "reasoning_brief": "Self-described former shipyard worker, 1978 to 1991, requests case review, references prior phone call last week."
}

When confidence drops below 0.75, the agent does nothing visible. The message stays in the inbox, unmoved, undrafted, and the partner triages it the old way. No category. No draft. No silent guess. This was the single most-requested behavior from the partners.

Warning

Never let a classifier silently route low-confidence legal mail. A misfiled limitation-period notice is a malpractice claim. Fail open to the human.

Routing without folder chaos

The old rule maze moved messages into folders. We did not. Folder moves break Outlook's conversation view, hide messages from search, and make partners feel like they have lost control of their own inbox. Instead, we use Outlook categories.

Each matter type maps to a colored category. Urgency adds a second category ("Urgent" red, "Routine" gray). The partner's inbox view is grouped by category. The messages stay in the inbox. The classification is reversible with a right-click.

We considered using a single category and overloading it with structured text, the way some Outlook power users do. We rejected that. The colored bar in the Outlook category column is the cheapest piece of UI the firm pays for. A partner scanning the inbox sees red for urgent and the matter color instantly. A string buried inside a category needs a parse. The visible classification is the part that buys the partner's trust.

PATCH https://graph.microsoft.com/v1.0/users/{userId}/messages/{messageId}
Content-Type: application/json

{
  "categories": ["Matter: Asbestos", "Urgency: High"]
}

We also stamp a single-property extension on the message with the classifier output, the model version, and the prompt hash. When a partner reports a misclassification, we can replay the exact decision two months later. This is not optional for a legal-tech deployment. The audit trail is the product.

Draft replies that the partner finishes

For 9 of the 14 matter types, the agent drafts a reply. Not all of them. Criminal defense, asbestos, and medical malpractice are excluded. The first contact in those matters is too consequential and too fact-specific to template, and the partners told us so on day one. The two remaining holdouts, tax and GDPR complaints, were excluded after a pilot week made it clear the firm's house phrasing carries real legal weight there.

For the 9 that remain, the draft is built from a matter-specific prompt plus the firm's house style. The draft lands in the recipient partner's Drafts folder via Graph's createReply endpoint, with a category that says "ABN draft, please review."

POST https://graph.microsoft.com/v1.0/users/{userId}/messages/{messageId}/createReply

A follow-up PATCH writes the body. We never set isDeliveryReceiptRequested. We never call send. The partner edits the draft inside Outlook the way they always have, and presses send themselves. The agent's contribution is the first 80% of the typing.

In the first three weeks of pilot, partners sent 71% of drafts within 12 minutes of opening them, edited 24%, and discarded 5%. We log every discard with the raw message and the discarded draft, and we use those to tune the per-matter prompts each Friday.

The most useful per-matter prompt change in week three was for debt collection. The original prompt assumed every incoming debt-collection notice was on behalf of a creditor. In Belgian practice, this firm represents debtors as often as creditors, and the agent was opening with creditor-friendly language. The fix was a sender-side signal: if the sender domain belonged to a known collection agency, draft as debtor counsel. Otherwise, draft neutrally and ask the partner to mark the side. That single change cut the discard rate on debt-collection drafts from 14% to 3% inside one week.

The DMS handshake we did not build

The partners asked, halfway through the pilot, whether the agent could file the message into the right iManage workspace. We said no. We meant it.

iManage Work has a solid filing UX inside Outlook already. The risk of an agent filing a message under the wrong matter ID is not theoretical. It is a conflict-of-interest disclosure waiting to happen. What the agent does instead is suggest a matter ID guess in the category bar ("Matter ID guess: MAT-2024-0837"). The partner clicks the iManage filing button in Outlook with the suggestion already in clipboard. iManage stays the source of truth.

This was not the technically interesting choice. It was the right one.

Partner-by-partner rollout

We did not turn the agent on for 38 mailboxes at once. We started with three: the practice manager and two senior associates who had volunteered. They ran for two weeks while we tuned the per-matter prompts and watched the discard log every morning. The next wave added the personal-injury partners (four mailboxes). The wave after that added the commercial and real-estate teams (eleven mailboxes). The criminal-defense partners came last, and even then opted into classification-only mode without drafts.

The rollout sequence was not technical. It was political. The lawyers who had built the most elaborate Outlook rules were the ones most likely to feel the loss of control. Giving them the dashboard early, letting them see every classification decision and reject any draft with one click, turned that into curiosity. By week six, two of the rule-builders were drafting their own prompt tweaks in plain Dutch and emailing them to us on Friday afternoons. We folded those into the next release on Monday. That feedback loop did more for accuracy than any model change.

What broke in week two

Three things broke that we did not predict.

First, Graph webhook deliveries are not guaranteed in order. A "message created" notification can arrive after a "message moved" notification for the same message. Our first version classified messages that had already been moved to Sent Items by a partner replying from mobile. We fixed it by re-fetching the message's parentFolderId at classification time and skipping anything that had left the inbox.

Second, the firm's Dutch and French native correspondence was not classified consistently. The classifier was strong in English and weak on Antwerp-dialect Flemish. We added a pre-step that detects language and routes Flemish and French messages through a language-specific system prompt with five labeled examples per matter type. The confidence floor jumped from 0.61 to 0.88 on Flemish in one afternoon.

Third, Microsoft Graph throttles Outlook calls per app and per tenant. We hit a 429 burst on a Monday morning when 38 mailboxes received a delayed Sunday-night batch of court notifications and the agent fired classification calls for all of them in parallel. We added a token-bucket limiter and a retry-with-jitter on 429 responses, with the Retry-After header treated as authoritative when Graph supplies it. The bucket has not emptied since.

Worth saying out loud: this is mundane. Most of the engineering on an email agent is not the model. It is the handshakes, the renewals, the retries, and the languages.

Where to start tomorrow

If you run a Microsoft 365 tenant and you want to know whether this pattern fits your firm, run one query. In PowerShell, against your tenant: Get-InboxRule -Mailbox <your-mailbox> | Measure-Object. Then run it for the four or five busiest mailboxes. If the total is over 200, you already have the problem. The rule maze is not a feature. It is a debt.

When we built this email agent for the Antwerp practice, the thing we kept running into was the temptation to let the agent send, file, and close loops on its own. We ended up solving it by treating the partner's inbox as the only place a human decision lives, and the agent as the fastest possible draft.

Key takeaway

A 90-day email agent that respects the existing DMS will outperform a 12-month AI legal platform that tries to replace it.

FAQ

Why Microsoft Graph instead of IMAP or an Outlook add-in?

Graph gives a tenant-wide, change-notification-driven API with mailbox-scope controls. IMAP is per-mailbox and noisy. Add-ins run client-side and miss messages opened on mobile or on another desk.

Does the agent send replies automatically?

No. Every reply lands in the partner's Drafts folder. The human reads, edits, and presses send. For five matter types we do not draft at all because the first contact carries too much legal weight.

How do you keep the agent out of mailboxes that did not opt in?

An Entra Application Access Policy restricts the app's Mail.ReadWrite scope to a single mail-enabled group. Without that policy, a Graph application token can read every mailbox in the tenant.

Why categories instead of folder moves?

Folder moves break Outlook's conversation view and hide messages from search. Categories preserve the inbox the partners already know, are reversible with a right-click, and group cleanly by matter and urgency.

email automationai agentsautomationintegrationsworkflowcase study

Building something?

Start a project