🤖 AI Summary
This study reveals, for the first time, the semantic-level supply chain attack risks introduced by SKILL.md files when treated as actionable metadata. Focusing on the discovery, selection, and governance phases of the skill lifecycle, the authors propose a purely semantics-driven attack paradigm that embeds retrieval manipulation, descriptive framing bias, and semantic evasion strategies. Evaluations in both real-world ClawHub skills and simulated registry environments demonstrate its effectiveness: achieving an 86% adversarial win rate and 80% Top-10 ranking during discovery; successfully inducing agents to select malicious variants in 77.6% of trials during selection; and evading detection with rates ranging from 36.5% to 100% in the governance phase. This work underscores the critical vulnerability posed by natural language metadata in AI agent security.
📝 Abstract
Autonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in which natural-language metadata and instructions can affect which skills are admitted, surfaced, selected, and loaded. We study SKILL.md - only attacks across three registry-facing stages of the Agent Skill lifecycle, using real ClawHub skills and realistic registry mechanisms. In Discovery, short textual triggers can manipulate embedding-based retrieval and improve adversarial skill visibility, achieving up to 86% pairwise win rate and 80% Top-10 placement. In Selection, description-only framing biases agents toward functionally equivalent adversarial variants, which are selected in 77.6% of paired trials on average. In Governance, semantic evasion strategies cause malicious skills to avoid a blocking verdict in 36.5%-100% of cases. Overall, our results show that SKILL.md is not passive documentation but operational text that shapes which third-party capabilities agents find, trust, and use.