UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

Neural NLP models often suffer from poor calibration, exhibiting overconfidence in incorrect predictions and thereby limiting their deployment in high-stakes applications. This work proposes a lightweight, inference-time uncertainty-aware attention mechanism that leverages Monte Carlo Dropout to approximate Bayesian inference, estimating token-level epistemic uncertainty and dynamically modulating the self-attention weights of pretrained Transformers—without altering the model architecture or training objective. Additionally, the authors introduce an inter-layer variance decomposition method to analyze how uncertainty accumulates across transformer layers. Experimental results demonstrate that the approach reduces expected calibration error by approximately 20% on average across SQuAD 2.0, MNLI, and SST-2, while preserving task accuracy and significantly enhancing selective prediction performance and robustness under distributional shift.

Technology Category

Application Category

📝 Abstract

Neural NLP models are often miscalibrated, assigning high confidence to incorrect predictions, which undermines selective prediction and high-stakes deployment. Post-hoc calibration methods adjust output probabilities but leave internal computation unchanged, while ensemble and Bayesian approaches improve uncertainty at substantial training or storage cost. We propose UAT-LITE, an inference-time framework that makes self-attention uncertainty-aware using approximate Bayesian inference via Monte Carlo dropout in pretrained transformer classifiers. Token-level epistemic uncertainty is estimated from stochastic forward passes and used to modulate self-attention during contextualization, without modifying pretrained weights or training objectives. We additionally introduce a layerwise variance decomposition to diagnose how predictive uncertainty accumulates across transformer depth. Across the SQuAD 2.0 answerability, MNLI, and SST-2, UAT-LITE reduces Expected Calibration Error by approximately 20% on average relative to a fine-tuned BERT-base baseline while preserving task accuracy, and improves selective prediction and robustness under distribution shift.

Problem

Research questions and friction points this paper is trying to address.

model calibration

uncertainty estimation

selective prediction

distribution shift

neural NLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty-aware attention

Monte Carlo dropout

inference-time calibration