🤖 AI Summary
This study addresses supply chain poisoning risks in large language model (LLM) coding agents introduced via third-party skill marketplaces, where malicious skills can hijack permissions for file writes, shell commands, and network requests. The authors propose Document-Driven Implicit Payload Execution (DDIPE), a novel attack method that embeds malicious logic within code examples and configuration templates in skill documentation, enabling payload execution through the agent’s automatic reuse of such content—without requiring explicit user triggers. This work is the first to reveal and validate this implicit execution pathway in LLM agent ecosystems, demonstrating its ability to bypass existing strong alignment and instruction filtering mechanisms. Using an LLM-powered pipeline, the authors generate adversarial skills covering 15 MITRE ATT&CK tactics, achieving evasion rates of 11.6%–33.5% across four frameworks and five models; notably, 2.5% of samples evade both detection and alignment safeguards, leading to four confirmed vulnerabilities and two implemented fixes.
📝 Abstract
LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.