Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing hallucination detection methods for large language models (LLMs) suffer from poor generalizability and strong model dependency. Method: We propose ACT-ViT, the first approach to apply Vision Transformers to activation tensor analysis—treating layer-token joint activation tensors as image-like structures for cross-LLM hallucination detection. ACT-ViT abandons conventional single-layer, single-token probing paradigms, instead enabling multi-model joint training and transfer learning. Contribution/Results: Evaluated across diverse LLMs (e.g., Llama, Qwen, Phi) and standard benchmarks, ACT-ViT consistently outperforms state-of-the-art probe-based classifiers. It achieves superior zero-shot generalization, maintains efficient inference latency, and demonstrates robust cross-model transferability and deployment stability—marking a significant advance in generic, architecture-agnostic hallucination detection.

Technology Category

Application Category

📝 Abstract

Detecting hallucinations in Large Language Model-generated text is crucial for their safe deployment. While probing classifiers show promise, they operate on isolated layer-token pairs and are LLM-specific, limiting their effectiveness and hindering cross-LLM applications. In this paper, we introduce a novel approach to address these shortcomings. We build on the natural sequential structure of activation data in both axes (layers $ imes$ tokens) and advocate treating full activation tensors akin to images. We design ACT-ViT, a Vision Transformer-inspired model that can be effectively and efficiently applied to activation tensors and supports training on data from multiple LLMs simultaneously. Through comprehensive experiments encompassing diverse LLMs and datasets, we demonstrate that ACT-ViT consistently outperforms traditional probing techniques while remaining extremely efficient for deployment. In particular, we show that our architecture benefits substantially from multi-LLM training, achieves strong zero-shot performance on unseen datasets, and can be transferred effectively to new LLMs through fine-tuning. Full code is available at https://github.com/BarSGuy/ACT-ViT.

Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in LLM-generated text

Overcoming limitations of LLM-specific probing classifiers

Enabling cross-LLM applications through multi-model training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Treats activation tensors as images for hallucination detection

Uses Vision Transformer model on multi-LLM activation data

Enables cross-LLM applications with zero-shot transfer capability

🔎 Similar Papers

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification