Prompt Injection Attack to Tool Selection in LLM Agents

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel threat to large language model (LLM) agents: prompt injection attacks on tool selection under *no black-box assumption*, where adversaries manipulate tool documentation to subvert both retrieval and selection stages, coercing the agent to invoke malicious tools for target tasks. To this end, we propose ToolHijacker—the first attack framework that formulates prompt injection as a tool documentation optimization problem, employing a gradient-guided, two-stage optimization strategy. Experiments demonstrate that ToolHijacker achieves up to 92.3% tool hijacking success across mainstream LLM agents, significantly outperforming manual and existing automated attacks. Crucially, we find that state-of-the-art defenses—including StruQ and SecAlign—fail catastrophically in tool-library settings, with average interception rates below 18%. This work provides the first systematic exposition of security vulnerabilities in LLM agent toolchains and establishes a critical benchmark for robust tool alignment research.

Technology Category

Application Category

📝 Abstract
Tool selection is a key component of LLM agents. The process operates through a two-step mechanism - emph{retrieval} and emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce extit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.
Problem

Research questions and friction points this paper is trying to address.

Attack manipulates LLM agent tool selection via malicious documents
Optimization strategy crafts effective malicious tool documents
Existing defenses fail to counter ToolHijacker attack effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

ToolHijacker injects malicious tool documents
Two-phase optimization crafts malicious documents
Defenses like StruQ and SecAlign are insufficient
🔎 Similar Papers
No similar papers found.