Select Me! When You Need a Tool: A Black-box Text Attack on Tool Selection

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses security vulnerabilities in the tool selection phase of large language model (LLM) tool learning. We propose, for the first time, a black-box text-based adversarial attack that manipulates an LLM’s preference for a target tool by perturbing its natural language description—without requiring access to model internals or gradients. Unlike prior studies focusing on erroneous tool *outputs*, our approach exposes the *semantic fragility* of the tool *selection* process itself. Our method employs a hybrid perturbation strategy operating at both coarse-grained (word-level) and fine-grained (character-level) granularities, achieving significant increases in both the selection probability and ranking position of the target tool with minimal textual modifications. Extensive experiments across diverse tool sets demonstrate high attack success rates, strong generalizability across models and tools, and practical feasibility. These findings establish a critical foundation for developing robust defense mechanisms against such selection-stage adversarial threats.

Technology Category

Application Category

📝 Abstract
Tool learning serves as a powerful auxiliary mechanism that extends the capabilities of large language models (LLMs), enabling them to tackle complex tasks requiring real-time relevance or high precision operations. Behind its powerful capabilities lie some potential security issues. However, previous work has primarily focused on how to make the output of the invoked tools incorrect or malicious, with little attention given to the manipulation of tool selection. To fill this gap, we introduce, for the first time, a black-box text-based attack that can significantly increase the probability of the target tool being selected in this paper. We propose a two-level text perturbation attack witha coarse-to-fine granularity, attacking the text at both the word level and the character level. We conduct comprehensive experiments that demonstrate the attacker only needs to make some perturbations to the tool's textual information to significantly increase the possibility of the target tool being selected and ranked higher among the candidate tools. Our research reveals the vulnerability of the tool selection process and paves the way for future research on protecting this process.
Problem

Research questions and friction points this paper is trying to address.

Exposing security vulnerabilities in LLM tool selection process
Manipulating tool selection via black-box text-based attacks
Increasing target tool selection probability through text perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box text attack on tool selection
Two-level word and character perturbations
Increases target tool selection probability
🔎 Similar Papers
No similar papers found.
Liuji Chen
Liuji Chen
Institute of Automation, Chinese Academy of Sciences
LLM AgentTrustworthy AI
H
Hao Gao
School of Computer Science, Beijing University of Posts and Telecommunications
Jinghao Zhang
Jinghao Zhang
Kuaishou Tech
Recommender SystemsMultimediaLarge Language Model
Q
Qiang Liu
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
S
Shu Wu
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
L
Liang Wang
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences