π€ AI Summary
This work addresses the critical challenge of hallucinations in multi-step reasoning by large language model (LLM) agents, which are difficult to trace and severely undermine system reliability. We introduce the first automated hallucination attribution task, aimed at identifying the specific reasoning steps that generate hallucinations and explaining their underlying causes. To support this, we construct AgentHallu, a high-quality benchmark spanning seven reasoning frameworks and five domains, accompanied by a fine-grained hallucination taxonomy comprising five major categories and fourteen subcategories, along with multi-level human annotations and causal explanations. Experimental results reveal that even state-of-the-art models achieve only 41.1% accuracy in step-level hallucination localization, with tool-use-related hallucinations proving especially challenging (11.6% accuracy), underscoring the taskβs significant difficulty and research importance.
π Abstract
As LLM-based agents operate over sequential multi-step reasoning, hallucinations arising at intermediate steps risk propagating along the trajectory, thus degrading overall reliability. Unlike hallucination detection in single-turn responses, diagnosing hallucinations in multi-step workflows requires identifying which step causes the initial divergence. To fill this gap, we propose a new research task, automated hallucination attribution of LLM-based agents, aiming to identify the step responsible for the hallucination and explain why. To support this task, we introduce AgentHallu, a comprehensive benchmark with: (1) 693 high-quality trajectories spanning 7 agent frameworks and 5 domains, (2) a hallucination taxonomy organized into 5 categories (Planning, Retrieval, Reasoning, Human-Interaction, and Tool-Use) and 14 sub-categories, and (3) multi-level annotations curated by humans, covering binary labels, hallucination-responsible steps, and causal explanations. We evaluate 13 leading models, and results show the task is challenging even for top-tier models (like GPT-5, Gemini-2.5-Pro). The best-performing model achieves only 41.1\% step localization accuracy, where tool-use hallucinations are the most challenging at just 11.6\%. We believe AgentHallu will catalyze future research into developing robust, transparent, and reliable agentic systems.