Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work exposes a critical security vulnerability in the Agent Skills framework: although designed to enable LLMs to dynamically acquire new knowledge via Markdown-formatted skill files, it is highly susceptible to minimalist prompt injection attacks. The authors demonstrate that adversaries can exploit this by crafting lengthy skill files containing malicious script references and leveraging users’ “do-not-ask-again” authorization—granted for semantically similar tasks—to migrate permissions across tasks, thereby bypassing system-level safeguards in mainstream coding agents. Experiments confirm that sensitive data—including local files and authentication credentials—can be exfiltrated without requiring sophisticated adversarial examples. This is the first systematic study revealing how Agent Skills mechanisms can be abused for low-threshold prompt injection, and establishing that permission approval states exhibit cross-task transferability. The findings provide empirical evidence and urgent design guidance for securing LLM-based agent systems against such privilege escalation and unauthorized information access.

Technology Category

Application Category

📝 Abstract

Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model capabilities, frontier LLMs remain vulnerable to very simple prompt injections in realistic scenarios. Our code is available at https://github.com/aisa-group/promptinject-agent-skills.

Problem

Research questions and friction points this paper is trying to address.

Agent Skills framework enables simple prompt injections in LLMs

Malicious instructions can exfiltrate sensitive data from systems

System guardrails can be bypassed using task-specific approvals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent Skills framework enables new knowledge via markdown files

Hiding malicious instructions in long skill files exfiltrates data

Bypassing system guardrails with benign task approval enables harm

🔎 Similar Papers

Deception in Reinforced Autonomous Agents

2024-05-07Citations: 1

Authors to Follow