🤖 AI Summary
This work addresses the challenges of execution stability and control coordination in building reliable autonomous agents within the constrained, dynamic, and state-volatile environment of smartphones. The authors propose a hierarchical agent architecture that decouples high-level probabilistic reasoning by large language models (LLMs) from low-level deterministic control interfaces. For the first time, they systematically distill design principles for mobile-oriented LLM runtime systems, emphasizing a synergistic mechanism between probabilistic planning and deterministic execution. This approach significantly enhances agent stability and reproducibility on real devices. The open-source implementation provides a foundation for future research and highlights three critical challenges: efficiency, adaptability, and stability.
📝 Abstract
Smartphones represent a uniquely challenging environment for agentic systems. Unlike cloud or desktop settings, mobile devices combine constrained execution contexts, fragmented control interfaces, and rapidly changing application states. As large language models (LLMs) evolve from conversational assistants to action-oriented agents, achieving reliable smartphone-native autonomy requires rethinking how reasoning and control are composed.
We introduce ClawMobile as a concrete exploration of this design space. ClawMobile adopts a hierarchical architecture that separates high-level language reasoning from structured, deterministic control pathways, improving execution stability and reproducibility on real devices. Using ClawMobile as a case study, we distill the design principles for mobile LLM runtimes and identify key challenges in efficiency, adaptability, and stability. We argue that building robust smartphone-native agentic systems demands principled coordination between probabilistic planning and deterministic system interfaces. The implementation is open-sourced~\footnote{https://github.com/ClawMobile/ClawMobile} to facilitate future exploration.