Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses a novel security vulnerability in large language model (LLM) agents that integrate third-party skill files: prompt injection attacks capable of triggering data exfiltration or destructive actions. The study presents the first systematic characterization and quantification of such attacks within skill files, introducing SkillInject—a comprehensive benchmark comprising 202 attack-task pairs spanning explicit to contextually concealed injection strategies. A red-teaming evaluation framework is developed to jointly assess both safety and functional performance across state-of-the-art LLMs. Experimental results reveal alarming attack success rates of up to 80%, underscoring the critical need for context-aware authorization mechanisms rather than reliance solely on model scaling or naive input filtering.

Technology Category

Application Category

📝 Abstract

LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful instructions including data exfiltration, destructive action, and ransomware-like behavior. They furthermore suggest that this problem will not be solved through model scaling or simple input filtering, but that robust agent security will require context-aware authorization frameworks. Our benchmark is available at https://www.skill-inject.com/.

Problem

Research questions and friction points this paper is trying to address.

prompt injection

LLM agents

skill files

security vulnerability

agent supply chain

Innovation

Methods, ideas, or system contributions that make the work stand out.

skill-based prompt injection

LLM agent security

SkillInject benchmark