🤖 AI Summary
Existing uncertainty quantification methods struggle to achieve fine-grained, token-level localization in long clinical texts. This work proposes a Reverse Probing framework that, for the first time, enables token-level uncertainty estimation tailored to clinical summaries. By supervising probes on internal model activations—such as delta energy and neighborhood context—the method directly extracts uncertainty signals from the hidden states of large language models without requiring additional output generation. Evaluated on two expert-annotated clinical datasets, the approach significantly outperforms eight baselines, achieving up to a fourfold improvement in AUPRC while substantially reducing inference time and computational overhead. Furthermore, it offers an interpretable mechanism by revealing how internal model responses correlate with uncertainty.
📝 Abstract
As large language models are increasingly deployed for clinical text, ensuring they can reliably signal their own uncertainty becomes critical. Most existing uncertainty quantification (UQ) methods are designed for open-domain generation and cannot localize uncertainty at the token or span level in long clinical text. We propose Reverse Probing, the first UQ framework specialized for clinical summarization, which estimates token-level uncertainty directly from pre-existing labeled summaries. Rather than sampling new outputs, Reverse Probing treats the text as a probe into the model's internal state, extracting uncertainty signals from four categories of internal activations. We evaluate on two expert-annotated clinical datasets and outperform eight adapted baselines on all metrics, achieving up to 4 times higher AUPRC while reducing inference time and computational costs. Feature analysis reveals that delta energy and neighborhood context are the most consistent predictors across all models. This study offers interpretable insights into how models internally respond to unsupported clinical content.