🤖 AI Summary
Vision Transformers (ViTs) require well-calibrated predictive uncertainties in risk-sensitive applications, yet conventional temperature scaling relies on a global scalar parameter and necessitates a separate validation set—limiting practicality. To address this, we propose Calibration Attention (CalAttn), a plug-and-play module enabling instance-level temperature scaling without any validation data. CalAttn leverages the ViT’s [CLS] token to learn sample-adaptive temperature coefficients, tightly coupling temperature estimation with the model’s internal representation and optimizing calibration end-to-end. Introducing only a negligible number of trainable parameters, CalAttn yields stable temperature values centered near 1.0. Evaluated across multiple benchmarks, it reduces Expected Calibration Error (ECE) by up to 4× compared to baseline methods, substantially improving predictive reliability while preserving classification accuracy.
📝 Abstract
Probability calibration is critical when Vision Transformers are deployed in risk-sensitive applications. The standard fix, post-hoc temperature scaling, uses a single global scalar and requires a held-out validation set. We introduce Calibration Attention (CalAttn), a drop-in module that learns an adaptive, per-instance temperature directly from the ViT's CLS token. Across CIFAR-10/100, MNIST, Tiny-ImageNet, and ImageNet-1K, CalAttn reduces calibration error by up to 4x on ViT-224, DeiT, and Swin, while adding under 0.1 percent additional parameters. The learned temperatures cluster tightly around 1.0, in contrast to the large global values used by standard temperature scaling. CalAttn is simple, efficient, and architecture-agnostic, and yields more trustworthy probabilities without sacrificing accuracy. Code: [https://github.com/EagleAdelaide/CalibrationAttention-CalAttn-](https://github.com/EagleAdelaide/CalibrationAttention-CalAttn-)