Prompt Injection Attack to Tool Selection in LLM Agents

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work identifies a novel threat to large language model (LLM) agents: prompt injection attacks on tool selection under *no black-box assumption*, where adversaries manipulate tool documentation to subvert both retrieval and selection stages, coercing the agent to invoke malicious tools for target tasks. To this end, we propose ToolHijacker—the first attack framework that formulates prompt injection as a tool documentation optimization problem, employing a gradient-guided, two-stage optimization strategy. Experiments demonstrate that ToolHijacker achieves up to 92.3% tool hijacking success across mainstream LLM agents, significantly outperforming manual and existing automated attacks. Crucially, we find that state-of-the-art defenses—including StruQ and SecAlign—fail catastrophically in tool-library settings, with average interception rates below 18%. This work provides the first systematic exposition of security vulnerabilities in LLM agent toolchains and establishes a critical benchmark for robust tool alignment research.

Technology Category

Application Category

📝 Abstract

Tool selection is a key component of LLM agents. The process operates through a two-step mechanism - emph{retrieval} and emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce extit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.

Problem

Research questions and friction points this paper is trying to address.

Attack manipulates LLM agent tool selection via malicious documents

Optimization strategy crafts effective malicious tool documents

Existing defenses fail to counter ToolHijacker attack effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

ToolHijacker injects malicious tool documents

Two-phase optimization crafts malicious documents

Defenses like StruQ and SecAlign are insufficient

🔎 Similar Papers

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

2023-10-19USENIX Security SymposiumCitations: 168

Uber

For New York, NY-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For San Francisco, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Seattle, WA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Sunnyvale, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year.

New York, NY, USA / San Francisco, CA, USA / Seattle, WA, USA

Authors to Follow