🤖 AI Summary
This work addresses the lack of fine-grained localization of hallucination sources during text generation in current large language model evaluations. It reframes hallucination assessment as a diagnostic problem and introduces PRISM, a novel benchmark that systematically decomposes hallucinations into four interpretable, generation-stage-aligned dimensions: knowledge absence, knowledge corruption, reasoning errors, and instruction-following failures. PRISM encompasses 65 controlled tasks and 9,448 instances, enabling stage-level evaluation across both mainstream open- and closed-source models. Experiments on 24 models reveal pervasive trade-offs among instruction adherence, memory retrieval, and logical reasoning, indicating that prevailing mitigation strategies often improve performance along one dimension at the expense of others.
📝 Abstract
As large language models (LLMs) evolve from conversational assistants into agents capable of handling complex tasks, they are increasingly deployed in high-risk domains. However, existing benchmarks largely rely on mixed queries and posterior evaluation, output-level scoring, which quantifies hallucination severity but offers limited insight into where and why hallucinations arise in the generation pipeline. We therefore reformulate hallucination evaluation as a diagnostic problem and propose PRISM, a controlled benchmark that disentangles hallucinations into four dimensions: knowledge missing, knowledge errors, reasoning errors, and instruction-following errors, grounded in three stages of generation (memory, instruction, and reasoning). PRISM contains 9,448 instances across 65 tasks and supports fine-grained, stage-aware diagnostic evaluation. Evaluating 24 mainstream open-source and proprietary LLMs, we uncover consistent trade-offs across instruction following, memory retrieval, and logical reasoning, showing that mitigation strategies often improve specific dimensions at the expense of others. We hope PRISM provides a framework for understanding the specific mechanisms behind LLMs hallucinations, ultimately accelerating the development of trustworthy large language models.