๐ค AI Summary
This work addresses the trade-off between privacy and performance in personal AI systems, where cloud-based large models compromise data sensitivity while purely local deployment suffers from a 25โ39 percentage point drop in accuracy. The authors propose OpenJarvis, a decoupled personal AI architecture that decomposes the system into five independently optimizable primitives: intelligence, engine, agent, tools & memory, and learningโenabling, for the first time, both modular decoupling and end-to-end optimizability. By introducing typed specifications and an LLM-guided collaborative search mechanism between local and cloud resources, the framework jointly optimizes prompts, tool descriptions, and memory configurations. Evaluated across eight benchmarks, OpenJarvis matches or exceeds cloud-only models on four tasks and trails the best cloud baseline by only 3.2 percentage points on average, while reducing API costs by approximately 800ร and cutting end-to-end latency by 4ร.
๐ Abstract
Personal AI stacks, like OpenClaw and Hermes Agent, are becoming central to daily work, yet they route nearly every query (often over sensitive local data) to cloud-hosted frontier models. Replacing frontier models with local models inside existing stacks does not work: swapping Claude Opus 4.6 for Qwen3.5-9B drops accuracy by 25-39 pp across personal AI tasks like PinchBench and GAIA. Existing stacks bundle agentic prompts, tool descriptions, memory configuration, and runtime settings around a specific cloud model. Only the prompts can be tuned, and state-of-the-art prompt optimizers close just 5 pp of the local-cloud gap on their own. This motivates a decomposed personal AI stack: one that exposes individual primitives which can be optimized individually or jointly to close the local-cloud gap. We present OpenJarvis, an architecture that represents a personal AI system as a typed spec over five primitives: Intelligence, Engine, Agents, Tools & Memory, and Learning. Each primitive is an independently editable field, making the stack end-to-end optimizable and measurable against accuracy, cost, and latency. Towards closing the local-cloud gap without surrendering local-model properties, OpenJarvis introduces LLM-guided spec search, a local-cloud collaboration in which frontier cloud models propose edits across the spec at search time, only non-regressing edits are accepted, and the resulting spec runs entirely on-device at inference time. With LLM-guided spec search, on-device specs match or exceed cloud accuracy on 4 of 8 benchmarks and land within 3.2 pp of the best cloud baseline on average. They also reduce marginal API cost by ~800x and end-to-end latency by 4x.