PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

📅 2026-04-18

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the lack of fine-grained localization of hallucination sources during text generation in current large language model evaluations. It reframes hallucination assessment as a diagnostic problem and introduces PRISM, a novel benchmark that systematically decomposes hallucinations into four interpretable, generation-stage-aligned dimensions: knowledge absence, knowledge corruption, reasoning errors, and instruction-following failures. PRISM encompasses 65 controlled tasks and 9,448 instances, enabling stage-level evaluation across both mainstream open- and closed-source models. Experiments on 24 models reveal pervasive trade-offs among instruction adherence, memory retrieval, and logical reasoning, indicating that prevailing mitigation strategies often improve performance along one dimension at the expense of others.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) evolve from conversational assistants into agents capable of handling complex tasks, they are increasingly deployed in high-risk domains. However, existing benchmarks largely rely on mixed queries and posterior evaluation, output-level scoring, which quantifies hallucination severity but offers limited insight into where and why hallucinations arise in the generation pipeline. We therefore reformulate hallucination evaluation as a diagnostic problem and propose PRISM, a controlled benchmark that disentangles hallucinations into four dimensions: knowledge missing, knowledge errors, reasoning errors, and instruction-following errors, grounded in three stages of generation (memory, instruction, and reasoning). PRISM contains 9,448 instances across 65 tasks and supports fine-grained, stage-aware diagnostic evaluation. Evaluating 24 mainstream open-source and proprietary LLMs, we uncover consistent trade-offs across instruction following, memory retrieval, and logical reasoning, showing that mitigation strategies often improve specific dimensions at the expense of others. We hope PRISM provides a framework for understanding the specific mechanisms behind LLMs hallucinations, ultimately accelerating the development of trustworthy large language models.

Problem

Research questions and friction points this paper is trying to address.

hallucination

diagnostic evaluation

generation pipeline

large language models

benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination diagnosis

controlled benchmark

stage-aware evaluation