🤖 AI Summary
This work addresses the challenge of hallucination detection in large language models (LLMs), which often generate plausible yet factually incorrect outputs. Existing approaches either rely on external knowledge sources or incur substantial computational overhead. The paper introduces a novel framework that treats the LLM as a black-box dynamical system, mapping its responses into a high-dimensional manifold via an embedding model. Leveraging Koopman operator theory, it models the dynamic evolution of factual and hallucinatory states and constructs a detection score based on discrepancies in prediction errors. Coupled with a preference-aware threshold calibration mechanism, the method enables efficient, low-cost hallucination identification with only a single forward pass. Evaluated on three benchmark datasets, the proposed approach significantly outperforms current methods, achieving state-of-the-art detection performance while substantially reducing computational costs.
📝 Abstract
Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model's latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.