Mobile apps

Apple's on-device AI: the quiet win for small app teams

Apple's cheaper on-device AI bet matters more to small Dutch mobile-app founders than the Gemini-Siri headline. The three Core AI hooks we are wiring in.

Jacob Molkenboer· Founder · A Brand New Company· 13 Jan 2025· 6 min

Antique brass balance scale on ivory desk blotter with small craft-paper parcel, linen string, red wax seal, green postcard.

A dispatcher in Pernis is staring at a screen that should show fourteen container moves from the night shift. It shows six. The other eight live as voice memos on a driver's phone, half in Dutch, half in Sranan Tongo, all under diesel hum. The founder who built her app rang us at 9:30 the next morning. She wanted to know if we could "do AI" on those clips without paying €1,800 a month for cloud transcription and a vector database that nobody asked for.

That call is why last week's Gemini-Siri headline missed the actual story for the kind of app teams we ship for.

The headline everyone read

Apple confirmed it is letting Google's Gemini run as a Siri fallback on harder prompts. Reasonable news, but irrelevant to most Dutch mobile teams under €5M in revenue. Their users are not asking Siri to plan a holiday. The interesting part of the same announcement sat three slides later: Apple opened its on-device foundation model to third-party apps. Free at the point of use. No per-token billing. No minimum spend. No data leaving the device.

That is the part that changes our quoting spreadsheet.

What on-device buys a small founder

The small teams we work with rarely fail because their model is not smart enough. They fail because the cloud bill turns a €19 monthly subscription into a loss. We have watched it happen three times in eighteen months. A team ships a "smart inbox" feature, traffic spikes, the LLM invoice arrives, the founder kills the feature, the App Store reviews tank.

On-device inference breaks that loop. The model is already on the user's phone. You do not pay for it. Latency is local. And if the German courts keep going where they went last week, with judges ruling that the platform serving a false AI answer carries liability for it, the question of where your inference happens stops being a technical detail and starts being a legal one. A model that ran on the customer's device, on data they own, is a very different artefact in a courtroom than a prompt you sent to someone else's GPU.

Three Core AI hooks we are wiring into a Rotterdam logistics app

The Pernis dispatcher's app is the one we are rebuilding now. It runs on a fleet of iPhone 14 and 15 handsets that the drivers already carry, so we have the floor we need for Apple's Foundation Models framework. Here is what is going in.

Voice memo to container move

The driver records a quick clip when the scanner does not catch a move. The phone transcribes it locally with Speech, then hands the text to the on-device model with a typed schema. The output is a row in the move log, no round trip.

import FoundationModels

@Generable
struct ContainerMove {
    @Guide(description: "ISO 6346 container id, 4 letters + 7 digits")
    let containerID: String
    @Guide(description: "loaded, unloaded, moved, damaged")
    let action: String
    @Guide(description: "Bay or stack reference, e.g. B14-03")
    let location: String
}

let session = LanguageModelSession(
    instructions: "Extract one container move from a driver's voice memo. Dutch or English."
)

let move = try await session.respond(
    to: transcript,
    generating: ContainerMove.self
).content

The first pass got 89% of moves right on a sample of 230 real memos. The misses were almost all background noise garbling the container ID. A second prompt that asks the model to flag low-confidence IDs for human review took that to 97% actionable.

Bill of lading photo to line items

Drivers photograph the bill at pickup. Vision handles the OCR, then the on-device model normalises the result into HS codes and weights against a small lookup the app already carries. The whole pipeline runs while the truck is still at the loading dock. No connectivity needed, which matters on the southern terminals where 4G drops to one bar.

Dispatcher inbox summary

The dispatcher receives roughly 180 messages per shift across email, WhatsApp, and the in-app channel. The model groups them by container reference, surfaces the three that mention damage or delay, and writes a single sentence per cluster. Three minutes saved per cluster. The dispatcher we tested it with went from skipping the inbox to clearing it twice a shift.

The trade-offs that still bite

This is not a free lunch.

The on-device model is small. It will not answer open-ended questions about international shipping law, and it should not try. When we tested it as an open chatbot, it hallucinated tariff codes confidently. The right mental model is "fast structured extractor and summariser," not "in-pocket expert."

Older iPhones are out. The framework needs an A17 Pro or M-series chip, which in practice means iPhone 15 Pro and up plus most iPads from the last two years. For a B2B fleet that is fine because you control the hardware. For a consumer app you will need a cloud fallback for the older half of your install base, and the moment you have that fallback you also have all the cost and privacy questions the on-device path was meant to dodge. Pick your audience first.

Warning

The Foundation Models framework is iOS 26 only. Anything you ship before September has to gate the feature behind an availability check, and you will need a graceful fallback for the long tail of users who never update.

What we are betting on

The interesting future is not Siri talking to Gemini. It is the thousand small apps that finally get to add a structured-extraction or summarisation feature without taking on a recurring per-user cloud cost. Last year that feature cost €4 to €11 per active user per month at scale and most founders cut it. This year it costs zero, and the floor for "an AI feature is in the budget" drops to "you have an iPhone 15 user."

When we wired this into the Rotterdam dispatcher's app, the part that surprised us was not the model quality. It was how much of the surrounding system became simpler. No proxy server. No rate-limit handling. No prompt-injection threat model for messages going up to a third-party API. If you are scoping AI agents for a mobile app this quarter, start by asking what the on-device model can already do, and only reach for the cloud where it genuinely cannot.

Five-minute audit you can run today: open your app's planned AI features and tag each one as "structured extraction," "summarisation," "open-ended generation," or "reasoning over private data." The first two and the last one belong on the device now. Only the third still needs the cloud, and probably less often than you think.

Key takeaway

For most apps under €5M in revenue, on-device inference is the right default. Reach for cloud only when the device model genuinely cannot deliver.

FAQ

What is Apple's Foundation Models framework?

A Swift API that lets third-party apps call Apple's on-device large language model directly, with typed output via @Generable, no network call, and no per-token billing.

Which iPhones can run on-device AI features?

Apple Intelligence and the Foundation Models framework require an A17 Pro chip or newer. In practice that means iPhone 15 Pro and later, plus most iPads released since 2024.

Should a small app still use cloud LLMs at all?

Yes, for open-ended generation and reasoning over large private corpora the device model is too small. Use cloud where it genuinely earns its cost, on-device for everything else.

Is on-device inference really free?

Apple does not bill per token. You still pay for development time, the device itself, and any cloud fallback you add for older hardware, but the marginal cost per inference is zero.

mobile appsai agentsarchitecturestrategyintegrationsoperations

Building something?

Start a project