UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural NLP models often suffer from poor calibration, exhibiting overconfidence in incorrect predictions and thereby limiting their deployment in high-stakes applications. This work proposes a lightweight, inference-time uncertainty-aware attention mechanism that leverages Monte Carlo Dropout to approximate Bayesian inference, estimating token-level epistemic uncertainty and dynamically modulating the self-attention weights of pretrained Transformers—without altering the model architecture or training objective. Additionally, the authors introduce an inter-layer variance decomposition method to analyze how uncertainty accumulates across transformer layers. Experimental results demonstrate that the approach reduces expected calibration error by approximately 20% on average across SQuAD 2.0, MNLI, and SST-2, while preserving task accuracy and significantly enhancing selective prediction performance and robustness under distributional shift.

Technology Category

Application Category

📝 Abstract
Neural NLP models are often miscalibrated, assigning high confidence to incorrect predictions, which undermines selective prediction and high-stakes deployment. Post-hoc calibration methods adjust output probabilities but leave internal computation unchanged, while ensemble and Bayesian approaches improve uncertainty at substantial training or storage cost. We propose UAT-LITE, an inference-time framework that makes self-attention uncertainty-aware using approximate Bayesian inference via Monte Carlo dropout in pretrained transformer classifiers. Token-level epistemic uncertainty is estimated from stochastic forward passes and used to modulate self-attention during contextualization, without modifying pretrained weights or training objectives. We additionally introduce a layerwise variance decomposition to diagnose how predictive uncertainty accumulates across transformer depth. Across the SQuAD 2.0 answerability, MNLI, and SST-2, UAT-LITE reduces Expected Calibration Error by approximately 20% on average relative to a fine-tuned BERT-base baseline while preserving task accuracy, and improves selective prediction and robustness under distribution shift.
Problem

Research questions and friction points this paper is trying to address.

model calibration
uncertainty estimation
selective prediction
distribution shift
neural NLP
Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty-aware attention
Monte Carlo dropout
inference-time calibration
epistemic uncertainty
transformer calibration
🔎 Similar Papers
No similar papers found.