🤖 AI Summary
This study addresses the critical gap in empirical research and labeled data concerning malicious behaviors in third-party large language model (LLM) agent skills. We present the first open-sourced, annotated dataset of malicious agent skills, derived from a large-scale behavioral analysis of 98,380 skills in community registries. Our investigation identifies 157 malicious skills containing 632 vulnerabilities, uncovering two dominant attack paradigms—data exfiltration and agent hijacking—and revealing sophisticated exploitation techniques targeting shadow features and platform hook systems. By integrating behavioral verification, vulnerability discovery, kill-chain modeling, and a responsible disclosure framework, our approach facilitated the removal of 93.6% of identified malicious skills within 30 days of disclosure, establishing essential infrastructure for advancing LLM agent security research.
📝 Abstract
Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users'machines. Skills execute with user privileges and are distributed through community registries with minimal vetting, but no ground-truth dataset exists to characterize the resulting threats. We construct the first labeled dataset of malicious agent skills by behaviorally verifying 98,380 skills from two community registries, confirming 157 malicious skills with 632 vulnerabilities. These attacks are not incidental. Malicious skills average 4.03 vulnerabilities across a median of three kill chain phases, and the ecosystem has split into two archetypes: Data Thieves that exfiltrate credentials through supply chain techniques, and Agent Hijackers that subvert agent decision-making through instruction manipulation. A single actor accounts for 54.1\% of confirmed cases through templated brand impersonation. Shadow features, capabilities absent from public documentation, appear in 0\% of basic attacks but 100\% of advanced ones; several skills go further by exploiting the AI platform's own hook system and permission flags. Responsible disclosure led to 93.6\% removal within 30 days. We release the dataset and analysis pipeline to support future work on agent skill security.