🤖 AI Summary
This work addresses the reliability challenges faced by AI agents operating in real-world software environments, where ensuring consistent alignment of perception, decision-making, and execution with user intent remains critical. The paper proposes the first unified framework that integrates a three-layer architecture—perception, decision, and execution—with a four-phase lifecycle encompassing creation, deployment, operation, and maintenance. This framework systematically analyzes the root causes of agent failures and intervention interfaces, clarifying the relationships among capability construction, permission exposure, and failure manifestations. By synthesizing representative systems, benchmarks, and advances in security and privacy, the study identifies key open problems, including controllable grounding, sustained constraint adherence, and secure permission binding, thereby establishing a new paradigm and structured research pathway toward achieving controllable and continuously reliable agent behavior.
📝 Abstract
Computer-use agents(CUAs)are moving frombounded benchmarks toward real software environments, wherethey operate browsers, desktops, mobile applications, flesystems,terminals, and tool backends. In such settings, reliability isno longer captured by task success alone: perception errors,planning drift, memory use, tool mediation, permission scope,and runtime oversight jointly determine whether agent actionsremain aligned with user intent, Existing surveys organize theCUA landscape by methods, platforms, benchmarks, or securitythreats, but less explicitly connect capability formation, author-ity exposure, failure manifestation, and control placement. Toaddress this gap, the article develops an architecture-lifecycleframework for deployment-grounded reliability in CUAs. Thearchitectural view analyzes Perception, Decision, and Executionas coupled layers that transform software observations intoauthority-bearing actions, The lifecycle view examines Creation.Deployment, Operation, and Maintenance as stages in which priorsare learned, tools and permissions are bound, runtime trajecto.ries are stressed, and assurance must be preserved under drift.Using this lens, the analysis synthesizes representative systems,benchmarks, and security/privacy studies; distinguishes wherefailures become visible from where their enabling conditions areintroduced, and maps recurring intervention surfaces for controloversight, and assurance. OpenClaw is used only as a public moti.vating example of an open deployment pattern, not as a verifedinternal case study. The conclusion highlights open challengesin controllable grounding, long-horizon constraint preservation,safe authority binding, mixed-trust runtime defense, privacy-preserving memory,and continual assurance.