MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Large language model (LLM) agents suffer from pervasive hallucination—stemming from distorted perception of instructions, interaction history, or environmental states—yet existing evaluations remain fragmented and lack systematic benchmarks. Method: We introduce MIRAGE-Bench, the first hallucination benchmark tailored for interactive LLM agents. It (1) establishes a three-dimensional hallucination taxonomy—covering instruction understanding, history modeling, and environment perception; (2) employs a snapshot-based strategy to freeze decision points, ensuring reproducible test cases; and (3) proposes a risk-aware, fine-grained LLM-as-a-Judge evaluation paradigm, enhanced by domain-specific prompt engineering to improve judgment reliability. Results: Experiments uncover diverse, recurrent hallucination patterns, deliver actionable root-cause analysis, and suggest concrete mitigation strategies—laying the foundation for rigorous, trustworthy evaluation of LLM agents in real-world interactive settings.

Technology Category

Application Category

📝 Abstract

Hallucinations pose critical risks for large language model (LLM)-based agents, often manifesting as hallucinative actions resulting from fabricated or misinterpreted information within the cognitive context. While recent studies have exposed such failures, existing evaluations remain fragmented and lack a principled testbed. In this paper, we present MIRAGE-Bench--Measuring Illusions in Risky AGEnt settings--the first unified benchmark for eliciting and evaluating hallucinations in interactive LLM-agent scenarios. We begin by introducing a three-part taxonomy to address agentic hallucinations: actions that are unfaithful to (i) task instructions, (ii) execution history, or (iii) environment observations. To analyze, we first elicit such failures by performing a systematic audit of existing agent benchmarks, then synthesize test cases using a snapshot strategy that isolates decision points in deterministic and reproducible manners. To evaluate hallucination behaviors, we adopt a fine-grained-level LLM-as-a-Judge paradigm with tailored risk-aware prompts, enabling scalable, high-fidelity assessment of agent actions without enumerating full action spaces. MIRAGE-Bench provides actionable insights on failure modes of LLM agents and lays the groundwork for principled progress in mitigating hallucinations in interactive environments.

Problem

Research questions and friction points this paper is trying to address.

Evaluating hallucinations in LLM-agent interactions

Providing a unified benchmark for agentic hallucinations

Mitigating risks from unfaithful agent actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark for LLM agent hallucinations

Three-part taxonomy for agentic hallucinations

Fine-grained LLM-as-a-Judge evaluation paradigm

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Authors to Follow