OpenJarvis: Personal AI, On Personal Devices

๐Ÿ“… 2026-05-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

244K/year
๐Ÿค– AI Summary
This work addresses the trade-off between privacy and performance in personal AI systems, where cloud-based large models compromise data sensitivity while purely local deployment suffers from a 25โ€“39 percentage point drop in accuracy. The authors propose OpenJarvis, a decoupled personal AI architecture that decomposes the system into five independently optimizable primitives: intelligence, engine, agent, tools & memory, and learningโ€”enabling, for the first time, both modular decoupling and end-to-end optimizability. By introducing typed specifications and an LLM-guided collaborative search mechanism between local and cloud resources, the framework jointly optimizes prompts, tool descriptions, and memory configurations. Evaluated across eight benchmarks, OpenJarvis matches or exceeds cloud-only models on four tasks and trails the best cloud baseline by only 3.2 percentage points on average, while reducing API costs by approximately 800ร— and cutting end-to-end latency by 4ร—.
๐Ÿ“ Abstract
Personal AI stacks, like OpenClaw and Hermes Agent, are becoming central to daily work, yet they route nearly every query (often over sensitive local data) to cloud-hosted frontier models. Replacing frontier models with local models inside existing stacks does not work: swapping Claude Opus 4.6 for Qwen3.5-9B drops accuracy by 25-39 pp across personal AI tasks like PinchBench and GAIA. Existing stacks bundle agentic prompts, tool descriptions, memory configuration, and runtime settings around a specific cloud model. Only the prompts can be tuned, and state-of-the-art prompt optimizers close just 5 pp of the local-cloud gap on their own. This motivates a decomposed personal AI stack: one that exposes individual primitives which can be optimized individually or jointly to close the local-cloud gap. We present OpenJarvis, an architecture that represents a personal AI system as a typed spec over five primitives: Intelligence, Engine, Agents, Tools & Memory, and Learning. Each primitive is an independently editable field, making the stack end-to-end optimizable and measurable against accuracy, cost, and latency. Towards closing the local-cloud gap without surrendering local-model properties, OpenJarvis introduces LLM-guided spec search, a local-cloud collaboration in which frontier cloud models propose edits across the spec at search time, only non-regressing edits are accepted, and the resulting spec runs entirely on-device at inference time. With LLM-guided spec search, on-device specs match or exceed cloud accuracy on 4 of 8 benchmarks and land within 3.2 pp of the best cloud baseline on average. They also reduce marginal API cost by ~800x and end-to-end latency by 4x.
Problem

Research questions and friction points this paper is trying to address.

personal AI
local models
cloud-model gap
on-device inference
AI stack optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

decomposed AI stack
LLM-guided spec search
on-device inference
personal AI
local-cloud gap
๐Ÿ”Ž Similar Papers
No similar papers found.