🤖 AI Summary
This work investigates the “self-awareness” of large language models (LLMs) regarding their own knowledge boundaries and disentangles the root cause of hallucination prediction—whether it stems from genuine metacognitive reflection or reliance on superficial question-side shortcuts. Method: We propose the Approximate Question-side Effect (AQE) metric to isolate question-perception biases from model introspection, and design Semantic Compression and Atomic-output (SCAO), a mechanism that strengthens model-side confidence signals via semantic abstraction and single-token responses. Contribution/Results: Through multi-dataset evaluation and systematic ablation of question-side cues, we demonstrate that SCAO significantly mitigates question-side shortcut exploitation while maintaining high stability and self-awareness accuracy—even under weak prompting. This work establishes the first separable, augmentable, and empirically verifiable quantitative framework for modeling LLMs’ knowledge self-awareness.
📝 Abstract
Hallucination prediction in large language models (LLMs) is often interpreted as a sign of self-awareness. However, we argue that such performance can arise from question-side shortcuts rather than true model-side introspection. To disentangle these factors, we propose the Approximate Question-side Effect (AQE), which quantifies the contribution of question-awareness. Our analysis across multiple datasets reveals that much of the reported success stems from exploiting superficial patterns in questions. We further introduce SCAO (Semantic Compression by Answering in One word), a method that enhances the use of model-side signals. Experiments show that SCAO achieves strong and consistent performance, particularly in settings with reduced question-side cues, highlighting its effectiveness in fostering genuine self-awareness in LLMs.