AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the critical challenge of hallucinations in multi-step reasoning by large language model (LLM) agents, which are difficult to trace and severely undermine system reliability. We introduce the first automated hallucination attribution task, aimed at identifying the specific reasoning steps that generate hallucinations and explaining their underlying causes. To support this, we construct AgentHallu, a high-quality benchmark spanning seven reasoning frameworks and five domains, accompanied by a fine-grained hallucination taxonomy comprising five major categories and fourteen subcategories, along with multi-level human annotations and causal explanations. Experimental results reveal that even state-of-the-art models achieve only 41.1% accuracy in step-level hallucination localization, with tool-use-related hallucinations proving especially challenging (11.6% accuracy), underscoring the task’s significant difficulty and research importance.

Technology Category

Application Category

📝 Abstract

As LLM-based agents operate over sequential multi-step reasoning, hallucinations arising at intermediate steps risk propagating along the trajectory, thus degrading overall reliability. Unlike hallucination detection in single-turn responses, diagnosing hallucinations in multi-step workflows requires identifying which step causes the initial divergence. To fill this gap, we propose a new research task, automated hallucination attribution of LLM-based agents, aiming to identify the step responsible for the hallucination and explain why. To support this task, we introduce AgentHallu, a comprehensive benchmark with: (1) 693 high-quality trajectories spanning 7 agent frameworks and 5 domains, (2) a hallucination taxonomy organized into 5 categories (Planning, Retrieval, Reasoning, Human-Interaction, and Tool-Use) and 14 sub-categories, and (3) multi-level annotations curated by humans, covering binary labels, hallucination-responsible steps, and causal explanations. We evaluate 13 leading models, and results show the task is challenging even for top-tier models (like GPT-5, Gemini-2.5-Pro). The best-performing model achieves only 41.1\% step localization accuracy, where tool-use hallucinations are the most challenging at just 11.6\%. We believe AgentHallu will catalyze future research into developing robust, transparent, and reliable agentic systems.

Problem

Research questions and friction points this paper is trying to address.

hallucination attribution

LLM-based agents

multi-step reasoning

agent reliability

step-level diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination attribution

LLM-based agents

multi-step reasoning