Live in alpha — Kotlin/KMP SDK

On-device AI.
Orchestrated.

Deploy, rollout, and monitor speech, LLM, and vision models on Android and iOS — without touching the app store.

Open source · Apache 2.0

Kotlin KMP · LiveSwift · Coming soon

Inference

100% on-device.
No cloud. No latency. No lock-in.

Runs entirely on the device

DeviceAI embeds inference directly into your Android and iOS app. Models run locally — no network calls, no cold starts, no usage bills. Works fully offline.

Sub-100ms responsesWorks offlineNo cloud round-tripsPrivacy by default

whisper.cppSpeech recognition

llama.cppLLM inference

ONNX RuntimeVision & ML models

Core MLiOS native acceleration

NNAPIAndroid hardware acceleration

How It Works

From SDK to inference
in six steps

Integrate the SDK

One dependency in your Kotlin, Swift, Flutter, or React Native project. Call DeviceAI.initialize(apiKey).

implementation("dev.deviceai:core:0.2.0")

Device registers

On first launch, the SDK profiles the device — RAM, NPU, CPU cores — and registers with the control plane. A capability tier is assigned.

tier: "flagship" // auto-scored

Manifest assigned

The control plane evaluates your rollout rules, canary cohort, and device tier. Returns an Ed25519-signed manifest listing which model to load.

GET /v1/manifest → { models, signature }

Model downloads

Models stream from Cloudflare R2 in the background — chunked, resumable, SHA-256 verified before use.

sha256: "verified ✓"

Inference runs

Your app calls the DeviceAI SDK. Inference runs fully on-device — whisper.cpp, llama.cpp, or ONNX Runtime. Zero cloud round-trips.

DeviceAI.llm.generate(messages)

Telemetry flows

Latency, memory, and error events buffer in memory and flush every 30 seconds. View live in the dashboard. Set auto-rollback thresholds.

POST /v1/telemetry/batch ← SDK

On-device AI.Orchestrated.

100% on-device.No cloud. No latency. No lock-in.