🤖 AI Summary
This work addresses the inefficiency of existing hallucination detection methods for large language models, which typically rely on repeated sampling and computationally expensive semantic similarity calculations. From the perspective of decision boundary expansion, the authors propose an efficient detection framework that aggregates token-level features via max pooling and employs multi-instance learning with a lightweight multilayer perceptron to directly predict sentence-level hallucination scores—eliminating the need for explicit semantic consistency computation. The approach demonstrates that scaling semantic consistency effectively enlarges the classification margin, achieving detection performance on par with state-of-the-art methods while substantially reducing computational overhead.
📝 Abstract
Hallucination detection has become increasingly important for improving the reliability of large language models (LLMs). Recently, hybrid approaches such as HaMI, which combine semantic consistency with internal model states via Multiple Instance Learning (MIL), have achieved state-of-the-art performance. However, these methods incur substantial computational overhead due to repeated sampling and costly semantic similarity computations. In this work, we first provide a theoretical analysis of HaMI in terms of decision margins, revealing that scaling internal states with semantic consistency leads to an enlarged decision margin. Motivated by this insight, we revisit classical sentence classification models from a margin enlargement perspective, aggregating token-level features via max pooling and directly estimating sentence scores using a lightweight MLP. Without requiring semantic consistency computations, our approach achieves substantial efficiency improvements while maintaining competitive performance with state-of-the-art baselines through adaptive aggregation of internal feature representations.