Reducing Tool Hallucination via Reliability Alignment

📅 2024-12-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently exhibit tool hallucinations—erroneous tool selection or invocation—during tool-augmented reasoning, undermining reliability and safety. Method: This work formally defines and categorizes tool hallucinations and introduces *reliability alignment*, a novel paradigm that expands the action space with *uncertainty-aware actions* (e.g., deferring invocation or requesting clarification) to enable robust, calibrated decision-making. We construct RelyToolBench, the first hallucination-aware evaluation benchmark, equipped with dedicated metrics, and propose Relign—a unified framework integrating supervised fine-tuning and reinforcement learning for reliability-aligned training. Contribution/Results: Extensive experiments demonstrate that Relign significantly reduces tool hallucination rates, improves task success rates and execution efficiency, and yields more stable, interpretable, and trustworthy LLM-tool interactions across diverse multi-step tool-chain scenarios.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have expanded their capabilities beyond language generation to interact with external tools, enabling automation and real-world applications. However, tool hallucinations, where models either select inappropriate tools or misuse them, pose significant challenges, leading to erroneous task execution, increased computational costs, and reduced system reliability. To systematically address this issue, we define and categorize tool hallucinations into two main types, tool selection hallucination and tool usage hallucination. To evaluate and mitigate these issues, we introduce RelyToolBench, which integrates specialized test cases and novel metrics to assess hallucination-aware task success and efficiency. Finally, we propose Relign, a reliability alignment framework that expands the tool-use action space to include indecisive actions, allowing LLMs to defer tool use, seek clarification, or adjust tool selection dynamically. Through extensive experiments, we demonstrate that Relign significantly reduces tool hallucinations, improves task reliability, and enhances the efficiency of LLM tool interactions.
Problem

Research questions and friction points this paper is trying to address.

Addresses tool hallucination in LLMs
Improves task reliability and efficiency
Proposes Relign for dynamic tool selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces RelyToolBench for evaluation
Proposes Relign for reliability alignment
Expands tool-use action space dynamically
Hongshen Xu
Hongshen Xu
Shanghai Jiao Tong University
Natural Language ProcessingLarge Language ModelLLM Alignment
S
Su Zhu
AISpeech Co., Ltd., Suzhou, China
Z
Zihan Wang
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
Hang Zheng
Hang Zheng
Zhejiang University
array signal processingDOA estimationbeamformingtensor signal processingmachine learning
Da Ma
Da Ma
Assistant Professor, School of Medicine, Wake Forest University
Medical Image ComputingComputational NeuroanatomyRadiogenomicsNeurodegenerative Disease
Ruisheng Cao
Ruisheng Cao
Shanghai Jiao Tong University
LLM Agenttext-to-SQLcode generationsemantic parsingdialogue systems
Shuai Fan
Shuai Fan
AISpeech Co., Ltd., Suzhou, China
L
Lu Chen
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
K
Kai Yu
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China