AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents

πŸ“… 2026-01-11
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the critical challenge of hallucinations in multi-step reasoning by large language model (LLM) agents, which are difficult to trace and severely undermine system reliability. We introduce the first automated hallucination attribution task, aimed at identifying the specific reasoning steps that generate hallucinations and explaining their underlying causes. To support this, we construct AgentHallu, a high-quality benchmark spanning seven reasoning frameworks and five domains, accompanied by a fine-grained hallucination taxonomy comprising five major categories and fourteen subcategories, along with multi-level human annotations and causal explanations. Experimental results reveal that even state-of-the-art models achieve only 41.1% accuracy in step-level hallucination localization, with tool-use-related hallucinations proving especially challenging (11.6% accuracy), underscoring the task’s significant difficulty and research importance.

Technology Category

Application Category

πŸ“ Abstract
As LLM-based agents operate over sequential multi-step reasoning, hallucinations arising at intermediate steps risk propagating along the trajectory, thus degrading overall reliability. Unlike hallucination detection in single-turn responses, diagnosing hallucinations in multi-step workflows requires identifying which step causes the initial divergence. To fill this gap, we propose a new research task, automated hallucination attribution of LLM-based agents, aiming to identify the step responsible for the hallucination and explain why. To support this task, we introduce AgentHallu, a comprehensive benchmark with: (1) 693 high-quality trajectories spanning 7 agent frameworks and 5 domains, (2) a hallucination taxonomy organized into 5 categories (Planning, Retrieval, Reasoning, Human-Interaction, and Tool-Use) and 14 sub-categories, and (3) multi-level annotations curated by humans, covering binary labels, hallucination-responsible steps, and causal explanations. We evaluate 13 leading models, and results show the task is challenging even for top-tier models (like GPT-5, Gemini-2.5-Pro). The best-performing model achieves only 41.1\% step localization accuracy, where tool-use hallucinations are the most challenging at just 11.6\%. We believe AgentHallu will catalyze future research into developing robust, transparent, and reliable agentic systems.
Problem

Research questions and friction points this paper is trying to address.

hallucination attribution
LLM-based agents
multi-step reasoning
agent reliability
step-level diagnosis
Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination attribution
LLM-based agents
multi-step reasoning
benchmark
causal explanation
πŸ”Ž Similar Papers
No similar papers found.
X
Xuannan Liu
Beijing University of Posts and Telecommunications
Xiao Yang
Xiao Yang
Tsinghua University
Computer VisionMachine Learning
Z
Zekun Li
University of California, Santa Barbara
Peipei Li
Peipei Li
Beijing University of Posts and Telecommunications (BUPT)
Computer VisionImage SynthesisFace Recognition
R
Ran He
Center for Research on Intelligent Perception and Computing, NLPR, CASIA