EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs

๐Ÿ“… 2025-09-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) and vision-language models (VLMs) frequently suffer from hallucinations and out-of-distribution (OOD) errors, yet existing detection methods lack real-time capability, interpretability, or impose prohibitive computational overhead. Method: We propose a lightweight, real-time detection framework leveraging geometric properties of hidden-layer activation spectra. Our approach constructs a compact global spectral signature by fusing spectral entropy, eigenvalue gaps, and KL divergence of the activation covariance matrix, and models temporal dependencies via a lightweight recurrent neural networkโ€”without resampling, model fine-tuning, or external data. Contribution/Results: The method achieves millisecond-level latency while preserving contextual and global signal integrity. It delivers interpretable early warnings and significantly outperforms state-of-the-art black-box, gray-box, and white-box baselines on both hallucination and OOD detection, achieving an optimal trade-off between high accuracy and real-time inference.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.
Problem

Research questions and friction points this paper is trying to address.

Detects hallucination and out-of-distribution errors in LLMs
Uses spectral geometry of hidden activations for real-time monitoring
Tracks temporal representation shifts before surface errors appear
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses spectral geometry of hidden activations
Streams covariance-spectrum statistics into classifier
Single forward pass without resampling needed
๐Ÿ”Ž Similar Papers
No similar papers found.