🤖 AI Summary
Pretrained Transformers lack an effective mechanism for uncertainty propagation, limiting their reliability in risk-sensitive scenarios. This work proposes the first integration of diffusion process principles into the Transformer architecture by modeling each feature transformation module as a probabilistic mapping, thereby establishing an end-to-end pathway for propagating uncertainty from the input distribution to the pretrained feature distribution. The resulting model significantly enhances calibration performance without compromising predictive accuracy. Empirical evaluations across multiple vision-and-language benchmarks demonstrate that this approach outperforms existing uncertainty-aware Transformers, offering a more reliable framework for applications where calibrated uncertainty estimates are critical.
📝 Abstract
Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications. Yet, most existing pre-trained transformers do not have a principled mechanism for uncertainty propagation through their feature transformation stack. In this work, we propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping. Composing these probabilistic mappings reveals a probability path that mimics the structure of a diffusion process, transporting data mass from the input distribution to the pre-trained feature distribution. This probability path can then be recompiled on a diffusion process with a unified transition model to enable principled propagation of representation uncertainty throughout the pre-trained model's architecture while maintaining its original predictive performance. Empirical results across a variety of vision and language benchmarks demonstrate that our method achieves superior calibration and predictive accuracy compared to existing uncertainty-aware transformers.