Calibration Attention: Instance-wise Temperature Scaling for Vision Transformers

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

Vision Transformers (ViTs) require well-calibrated predictive uncertainties in risk-sensitive applications, yet conventional temperature scaling relies on a global scalar parameter and necessitates a separate validation set—limiting practicality. To address this, we propose Calibration Attention (CalAttn), a plug-and-play module enabling instance-level temperature scaling without any validation data. CalAttn leverages the ViT’s [CLS] token to learn sample-adaptive temperature coefficients, tightly coupling temperature estimation with the model’s internal representation and optimizing calibration end-to-end. Introducing only a negligible number of trainable parameters, CalAttn yields stable temperature values centered near 1.0. Evaluated across multiple benchmarks, it reduces Expected Calibration Error (ECE) by up to 4× compared to baseline methods, substantially improving predictive reliability while preserving classification accuracy.

Technology Category

Application Category

📝 Abstract

Probability calibration is critical when Vision Transformers are deployed in risk-sensitive applications. The standard fix, post-hoc temperature scaling, uses a single global scalar and requires a held-out validation set. We introduce Calibration Attention (CalAttn), a drop-in module that learns an adaptive, per-instance temperature directly from the ViT's CLS token. Across CIFAR-10/100, MNIST, Tiny-ImageNet, and ImageNet-1K, CalAttn reduces calibration error by up to 4x on ViT-224, DeiT, and Swin, while adding under 0.1 percent additional parameters. The learned temperatures cluster tightly around 1.0, in contrast to the large global values used by standard temperature scaling. CalAttn is simple, efficient, and architecture-agnostic, and yields more trustworthy probabilities without sacrificing accuracy. Code: [https://github.com/EagleAdelaide/CalibrationAttention-CalAttn-](https://github.com/EagleAdelaide/CalibrationAttention-CalAttn-)

Problem

Research questions and friction points this paper is trying to address.

Improves probability calibration in Vision Transformers

Replaces global temperature scaling with instance-wise adaptation

Reduces calibration error without sacrificing model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive per-instance temperature learning

Uses CLS token for direct temperature scaling

Reduces calibration error with minimal parameters

🔎 Similar Papers

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation