← Blog

Mobile apps

Apple's Gemini Siri: why your mobile agent stack is fine

Apple is wiring Gemini into Siri. For small Dutch SaaS founders shipping mobile agents in 2026, the news matters less than it looks. Two on-device assumptions still hold.

Jacob Molkenboer· Founder · A Brand New Company· 9 Jun 2026· 6 min
Ivory paper surface with twine-tied craft parcel, folded chartreuse postcard, and small brass scale weight in side light.

The keynote panic

Tuesday night, 7pm Amsterdam time. A founder we work with in Utrecht is watching the WWDC keynote on a second monitor while finishing a release. Apple wheels out the new Siri architecture, names Gemini as the brain behind the heavy lifting, and the founder pings us within ninety seconds: "do we throw out the on-device model? do we route everything through Apple Intelligence now?"

The answer was no on Tuesday. It is still no this morning. Apple putting Gemini behind Siri is a story about Apple's margins, not about your roadmap.

What actually shipped

Strip the keynote framing and the changes are narrow. Siri now hands open-ended reasoning to Gemini when the on-device dispatcher decides the request is too big for the local model. The on-device model itself, the one that has been quietly classifying intents since iOS 18, is still there. Apple's Apple Intelligence documentation still describes the same dispatch pattern: handle locally where possible, escalate to Private Cloud Compute where needed, and now escalate again to a third party for the long tail.

Three things about that:

  • The third-party leg is Apple's win, not yours. You did not get a new API.
  • You also did not lose one. The App Intents framework, the Core ML pipeline, and the on-device small model you ship with your binary are unchanged.
  • Whatever Gemini sees through Siri is inside Apple's contractual envelope, not yours. You cannot use it as a substitute for shipping your own model path.

If you build the kind of mobile app we build for clients (an invoicing tool, a field-service dispatcher, a Dutch-language scheduling assistant), none of this changes the work.

Assumption one: the latency budget on a 4G train

The first assumption that still holds: actions that should feel instant cannot wait on a round trip.

Take a real example. One of our clients ships a mobile app that lets bookkeepers categorise expenses by holding the phone up to a paper receipt. The whole flow has to finish in about 400 milliseconds or the user loses the moment and types it in manually. That budget breaks down roughly like this:

camera frame grab        80 ms
image preprocessing      40 ms
OCR (Vision framework)  120 ms
intent classification    60 ms
category prediction      60 ms
UI commit                40 ms
                       -----
total                  400 ms

None of that has room for a Gemini round trip. A best-case call from an iPhone on a Dutch 4G cell to a Gemini endpoint in Belgium is 180 to 350 ms before the model starts generating. On the Intercity between Amsterdam and Eindhoven, with handover gaps and the occasional tunnel, that number is closer to 800 ms. Add the model's own latency and you sit at 1.2 seconds, three times the budget, before a single token comes back.

This was the case last year. It is the case today. It will be the case in 2027. Apple's pivot to Gemini does not move the speed of light, and our customers' commuters still ride the Intercity.

Takeaway

If a feature has to feel like a button press, it runs on the device. No keynote changes that.

Assumption two: data residency under GDPR

The second assumption: anything touching personal data or commercially sensitive content needs a path you control end to end.

Two things tightened this in the last twelve months. First, the EU AI Act's transparency and risk-classification rules now apply to providers and deployers of general-purpose AI systems, including any LLM endpoint your app calls. The European Commission's AI Act overview spells out the deployer obligations, and they do not disappear because your traffic went through Apple. Second, Dutch regulators have been quietly explicit with our clients in regulated sectors: routing customer text through a US-hosted model is fine for some workloads and not fine for others, and you must be able to draw the line on a diagram for the auditor.

When Apple now passes a fraction of Siri's reasoning to Gemini, the data path becomes: user device, Apple, Google. Apple says Private Cloud Compute is end-to-end encrypted and that third-party queries are anonymised. That is fine for the user asking Siri what the weather is in Groningen. It is not fine when your B2B app's customer dictates a clause from a contract into a voice field and expects only your servers to see it.

So the assumption holds: if your app handles invoices, patient context, employee records, or anything a Dutch DPO would flag, the on-device path is not a fallback. It is the default, and you escalate to your own EU-hosted model only when the local one cannot finish the job.

The stack we would still ship today

The stack we would put a new mobile agent on this week is the same one we would have shipped in March:

  1. On-device intent classifier in Core ML, trained on your domain. Forty to ninety milliseconds, no network.
  2. On-device small language model (we usually pick a 3B or 4B quantised model via MLX or llama.cpp on iOS, depending on the device tier) for short generative work: summarising a receipt, drafting a one-line reply, filling a form field.
  3. EU-hosted cloud model (a mix of Mistral in Paris and self-hosted Llama variants for clients in regulated sectors) for any task the on-device model declines.
  4. Apple Intelligence as an optional accelerator for general queries you do not own. Never as the only path.

The dispatcher between these layers is a small Swift class that looks roughly like this:

enum Route { case onDevice, euCloud, appleIntelligence }

func route(for request: AgentRequest) -> Route {
    if request.containsPII || request.isRegulated {
        return request.fitsOnDevice ? .onDevice : .euCloud
    }
    if request.latencyBudgetMs < 500 {
        return .onDevice
    }
    if request.isGeneralKnowledge && user.optedIntoAppleIntelligence {
        return .appleIntelligence
    }
    return .euCloud
}

That dispatcher is the only thing that moved with the Gemini news. We added one branch. The on-device path stayed where it was.

Warning

If you are tempted to retire your on-device work because Apple Intelligence sounds good enough, ask your DPO whether they can draw the contract-text data path on a single A4 sheet. If the answer is no, you cannot ship that retirement.

The five-minute audit

When we built the on-device agent for a Dutch invoicing client last quarter, the thing we kept running into was the temptation to let the cloud model handle "just one more" edge case until the on-device path was vestigial. We solved it by drawing the dispatcher on a whiteboard before writing a line of code, and we now run that same exercise for every AI agent we put into a mobile app.

If you want to do the audit today: open your app, list every place a user expects an instant action, and put a red dot on each one that currently waits for a network call. Those dots are your on-device work, regardless of what runs behind Siri.

Key takeaway

Apple putting Gemini behind Siri is Apple's win, not your roadmap. The on-device path still exists because latency and GDPR never moved.

FAQ

Does Apple's Gemini-backed Siri replace the model I ship in my own app?

No. The Gemini integration only changes what Siri itself can do for general queries. Your in-app agent stack, including the on-device model and your own cloud path, is untouched and still your responsibility.

Can I use Apple Intelligence as my only LLM path?

For consumer apps with no regulated data, yes. For Dutch B2B apps handling invoices, contracts, or customer PII, no. You still need an EU-hosted path you control end to end.

What is a realistic on-device latency budget for an instant action in 2026?

Plan for 300 to 500 milliseconds end to end. That rules out any cloud round trip on commuter mobile networks, so on-device intent classification stays mandatory for anything that should feel like a button press.

Which on-device model do you usually pick on iOS?

A 3B or 4B quantised model running through MLX or llama.cpp on the device tier the customer base actually uses. We size it down for older iPhones and only escalate to the cloud when the local model cannot finish the job.

ai agentsmobile appsarchitecturestrategyintegrations

Building something?

Start a project