🤖 AI Summary
This work addresses the challenge of hallucinations in vision-language models (VLMs)—factually incorrect outputs that undermine reliable deployment—by conceptualizing hallucination as a dynamic pathology in the model’s cognitive computation. The authors propose a multi-stage diagnostic framework grounded in computational rationality, which maps the generation process into a low-dimensional cognitive state space using information-theoretic probes. Leveraging a “geometric–information duality” principle, hallucination detection is reformulated as geometric anomaly detection along cognitive trajectories, enabling interpretable attribution to three core pathologies: perceptual instability, logical-causal failure, and decisional ambiguity. Integrated with weakly supervised learning and contamination-robust calibration, the method achieves state-of-the-art performance on benchmarks including POPE, MME, and MS-COCO, demonstrating high efficiency, adaptability under weak supervision, and strong robustness.
📝 Abstract
Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a new paradigm for diagnosing hallucinations, recasting them from static output errors into dynamic pathologies of a model's computational cognition. Our framework is grounded in a normative principle of computational rationality, allowing us to model a VLM's generation as a dynamic cognitive trajectory. We design a suite of information-theoretic probes that project this trajectory onto an interpretable, low-dimensional Cognitive State Space. Our central discovery is a governing principle we term the geometric-information duality: a cognitive trajectory's geometric abnormality within this space is fundamentally equivalent to its high information-theoretic surprisal. Hallucination detection is counts as a geometric anomaly detection problem. Evaluated across diverse settings - from rigorous binary QA (POPE) and comprehensive reasoning (MME) to unconstrained open-ended captioning (MS-COCO) - our framework achieves state-of-the-art performance. Crucially, it operates with high efficiency under weak supervision and remains highly robust even when calibration data is heavily contaminated. This approach enables a causal attribution of failures, mapping observable errors to distinct pathological states: perceptual instability (measured by Perceptual Entropy), logical-causal failure (measured by Inferential Conflict), and decisional ambiguity (measured by Decision Entropy). Ultimately, this opens a path toward building AI systems whose reasoning is transparent, auditable, and diagnosable by design.