Select Me! When You Need a Tool: A Black-box Text Attack on Tool Selection

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses security vulnerabilities in the tool selection phase of large language model (LLM) tool learning. We propose, for the first time, a black-box text-based adversarial attack that manipulates an LLM’s preference for a target tool by perturbing its natural language description—without requiring access to model internals or gradients. Unlike prior studies focusing on erroneous tool *outputs*, our approach exposes the *semantic fragility* of the tool *selection* process itself. Our method employs a hybrid perturbation strategy operating at both coarse-grained (word-level) and fine-grained (character-level) granularities, achieving significant increases in both the selection probability and ranking position of the target tool with minimal textual modifications. Extensive experiments across diverse tool sets demonstrate high attack success rates, strong generalizability across models and tools, and practical feasibility. These findings establish a critical foundation for developing robust defense mechanisms against such selection-stage adversarial threats.

Technology Category

Application Category

📝 Abstract

Tool learning serves as a powerful auxiliary mechanism that extends the capabilities of large language models (LLMs), enabling them to tackle complex tasks requiring real-time relevance or high precision operations. Behind its powerful capabilities lie some potential security issues. However, previous work has primarily focused on how to make the output of the invoked tools incorrect or malicious, with little attention given to the manipulation of tool selection. To fill this gap, we introduce, for the first time, a black-box text-based attack that can significantly increase the probability of the target tool being selected in this paper. We propose a two-level text perturbation attack witha coarse-to-fine granularity, attacking the text at both the word level and the character level. We conduct comprehensive experiments that demonstrate the attacker only needs to make some perturbations to the tool's textual information to significantly increase the possibility of the target tool being selected and ranked higher among the candidate tools. Our research reveals the vulnerability of the tool selection process and paves the way for future research on protecting this process.

Problem

Research questions and friction points this paper is trying to address.

Exposing security vulnerabilities in LLM tool selection process

Manipulating tool selection via black-box text-based attacks

Increasing target tool selection probability through text perturbations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box text attack on tool selection

Two-level word and character perturbations

Increases target tool selection probability

🔎 Similar Papers

No similar papers found.