🤖 AI Summary
This work addresses the critical security challenge that arises when host-based autonomous agents, given only high-level user objectives, generate execution plans without sufficient semantic constraints—such as process boundaries or safety limits—leading to potentially high-risk behaviors. We propose the first semantics-aware threat model specifically tailored for host agents, systematically analyzing the semantic completion process from abstract goals to executable plans and identifying multiple risk-inducing completion patterns. Through execution trace analysis, a case study on OpenClaw, and explicit modeling of safety boundaries, we derive defensive design principles that constrain hazardous semantic completions and clarify operational boundaries. Our findings offer both theoretical foundations and practical guidance for developing secure and reliable autonomous agent systems.
📝 Abstract
Host-acting agents promise a convenient interaction model in which users specify goals and the system determines how to realize them. We argue that this convenience introduces a distinct security problem: semantic under-specification in goal specification. User instructions are typically goal-oriented, yet they often leave process constraints, safety boundaries, persistence, and exposure insufficiently specified. As a result, the agent must complete missing execution semantics before acting, and this completion can produce risky host-side plans even when the user-stated goal is benign. In this paper, we develop a semantic threat model, present a taxonomy of semantic-induced risky completion patterns, and study the phenomenon through an OpenClaw-centered case study and execution-trace analysis. We further derive defense design principles for making execution boundaries explicit and constraining risky completion. These findings suggest that securing host-acting agents requires governing not only which actions are allowed at execution time, but also how goal-only instructions are translated into executable plans.