Inference
100% on-device.
No cloud. No latency. No lock-in.
Runs entirely on the device
DeviceAI embeds inference directly into your Android and iOS app. Models run locally — no network calls, no cold starts, no usage bills. Works fully offline.
How It Works
From SDK to inference
in six steps
Integrate the SDK
One dependency in your Kotlin, Swift, Flutter, or React Native project. Call DeviceAI.initialize(apiKey).
Device registers
On first launch, the SDK profiles the device — RAM, NPU, CPU cores — and registers with the control plane. A capability tier is assigned.
Manifest assigned
The control plane evaluates your rollout rules, canary cohort, and device tier. Returns an Ed25519-signed manifest listing which model to load.
Model downloads
Models stream from Cloudflare R2 in the background — chunked, resumable, SHA-256 verified before use.
Inference runs
Your app calls the DeviceAI SDK. Inference runs fully on-device — whisper.cpp, llama.cpp, or ONNX Runtime. Zero cloud round-trips.
Telemetry flows
Latency, memory, and error events buffer in memory and flush every 30 seconds. View live in the dashboard. Set auto-rollback thresholds.