Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work proposes Aura, an intent-centric operating system architecture for mobile agents that addresses critical security vulnerabilities inherent in the prevailing “screen-as-interface” paradigm—such as visual spoofing, identity impersonation, and unauthorized execution. Aura adopts a hub-and-spoke topology wherein a system agent orchestrates user intents, while sandboxed application agents perform tasks under unified kernel-enforced communication and security policies. By replacing fragile GUI scraping with structured, agent-native interactions, Aura establishes an intent-based secure execution model. It introduces a global agent registry for cryptographically bound identities, multi-layer semantic firewalls for input sanitization, taint-aware memory with plan-trajectory alignment to ensure cognitive integrity, and fine-grained access control coupled with non-repudiable auditing. Evaluated on MobileSafetyBench, Aura increases success rates for low-risk tasks to 94.3% (+19.3%) and reduces high-risk attack success to 4.4% (−35.6%), while cutting latency by nearly an order of magnitude.

Technology Category

Application Category

📝 Abstract

The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a"Screen-as-Interface"paradigm, which inherits structural vulnerabilities and conflicts with the mobile ecosystem's economic foundations. In this paper, we conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant as a representative case. We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution - revealing critical flaws such as fake App identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation stemming from a reliance on unstructured visual data. To address these challenges, we propose Aura, an Agent Universal Runtime Architecture for a clean-slate secure agent OS. Aura replaces brittle GUI scraping with a structured, agent-native interaction model. It adopts a Hub-and-Spoke topology where a privileged System Agent orchestrates intent, sandboxed App Agents execute domain-specific tasks, and the Agent Kernel mediates all communication. The Agent Kernel enforces four defense pillars: (i) cryptographic identity binding via a Global Agent Registry; (ii) semantic input sanitization through a multilayer Semantic Firewall; (iii) cognitive integrity via taint-aware memory and plan-trajectory alignment; and (iv) granular access control with non-deniable auditing. Evaluation on MobileSafetyBench shows that, compared to Doubao, Aura improves low-risk Task Success Rate from roughly 75% to 94.3%, reduces high-risk Attack Success Rate from roughly 40% to 4.4%, and achieves near-order-of-magnitude latency gains. These results demonstrate Aura as a viable, secure alternative to the"Screen-as-Interface"paradigm.

Problem

Research questions and friction points this paper is trying to address.

mobile agents

screen-as-interface

security vulnerabilities

LLM-based systems

intent-centric architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent Operating System

Intent-Centric Architecture

Semantic Firewall