🤖 AI Summary
This work addresses the lack of systematic security evaluation in OpenClaw, an open-source generative agent platform, by proposing and deploying Clawdrain—a trojan skill that triggers a multi-round “segmented verification protocol” via SKILL.md instruction injection. Leveraging tool-calling chains, Clawdrain induces agents to continuously consume API tokens. In a real-world deployment using Gemini 2.5 Pro under actual billing conditions, and incorporating PROGRESS/REPAIR/TERMINAL signal feedback mechanisms, this study provides the first empirical evidence of how tool composition, recovery behaviors, and interface design influence token exhaustion attacks. Experiments demonstrate a 6–7× token amplification (up to 9× in extreme cases). Notably, agents autonomously combine general-purpose tools to bypass fragile protocol steps, thereby reducing amplification and altering attack dynamics, revealing several novel production-grade attack vectors.
📝 Abstract
Modern generative agents such as OpenClaw - an open-source, self-hosted personal assistant with a community skill ecosystem, are gaining attention and are used pervasively. However, the openness and rapid growth of these ecosystems often outpace systematic security evaluation. In this paper, we design, implement, and evaluate Clawdrain, a Trojanized skill that induces a multi-turn "Segmented Verification Protocol" via injected SKILL.md instructions and a companion script that returns PROGRESS/REPAIR/TERMINAL signals. We deploy Clawdrain in a production-like OpenClaw instance with real API billing and a production model (Gemini 2.5 Pro), and we measure 6-7x token amplification over a benign baseline, with a costly, failure configuration reaching approximately 9x. We observe a deployment-only phenomenon: the agent autonomously composes general-purpose tools (e.g., shell/Python) to route around brittle protocol steps, reducing amplification and altering attack dynamics. Finally, we identify production vectors enabled by OpenClaw's architecture, including SKILL.md prompt bloat, persistent tool-output pollution, cron/heartbeat frequency amplification, and behavioral instruction injection. Overall, we demonstrate that token-drain attacks remain feasible in real deployments, but their magnitude and observability are shaped by tool composition, recovery behavior, and interface design.