🤖 AI Summary
This work addresses security vulnerabilities in the tool selection phase of large language model (LLM) tool learning. We propose, for the first time, a black-box text-based adversarial attack that manipulates an LLM’s preference for a target tool by perturbing its natural language description—without requiring access to model internals or gradients. Unlike prior studies focusing on erroneous tool *outputs*, our approach exposes the *semantic fragility* of the tool *selection* process itself. Our method employs a hybrid perturbation strategy operating at both coarse-grained (word-level) and fine-grained (character-level) granularities, achieving significant increases in both the selection probability and ranking position of the target tool with minimal textual modifications. Extensive experiments across diverse tool sets demonstrate high attack success rates, strong generalizability across models and tools, and practical feasibility. These findings establish a critical foundation for developing robust defense mechanisms against such selection-stage adversarial threats.
📝 Abstract
Tool learning serves as a powerful auxiliary mechanism that extends the capabilities of large language models (LLMs), enabling them to tackle complex tasks requiring real-time relevance or high precision operations. Behind its powerful capabilities lie some potential security issues. However, previous work has primarily focused on how to make the output of the invoked tools incorrect or malicious, with little attention given to the manipulation of tool selection. To fill this gap, we introduce, for the first time, a black-box text-based attack that can significantly increase the probability of the target tool being selected in this paper. We propose a two-level text perturbation attack witha coarse-to-fine granularity, attacking the text at both the word level and the character level. We conduct comprehensive experiments that demonstrate the attacker only needs to make some perturbations to the tool's textual information to significantly increase the possibility of the target tool being selected and ranked higher among the candidate tools. Our research reveals the vulnerability of the tool selection process and paves the way for future research on protecting this process.