Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

The softmax temperature parameter (T) in deep classification models suffers from poor generalizability and requires task-specific tuning, hindering robust calibration. Method: This paper proposes a training-free, robust analytical method for setting (T). It establishes, for the first time, a theoretical closed-form relationship between the optimal temperature (T^*) and feature dimensionality; designs a transferable temperature correction coefficient framework; and introduces a lightweight, task-aware refinement mechanism. The method requires no additional training or validation data—only intermediate-layer feature dimensions and their statistical properties. Contribution/Results: Evaluated systematically across diverse architectures (e.g., ResNet, ViT) and domains (image, text, remote sensing), the approach significantly improves calibration accuracy and classification performance. It exhibits strong cross-task and cross-domain generalization with negligible computational overhead.

Technology Category

Application Category

📝 Abstract

In deep learning-based classification tasks, the softmax function's temperature parameter $T$ critically influences the output distribution and overall performance. This study presents a novel theoretical insight that the optimal temperature $T^*$ is uniquely determined by the dimensionality of the feature representations, thereby enabling training-free determination of $T^*$. Despite this theoretical grounding, empirical evidence reveals that $T^*$ fluctuates under practical conditions owing to variations in models, datasets, and other confounding factors. To address these influences, we propose and optimize a set of temperature determination coefficients that specify how $T^*$ should be adjusted based on the theoretical relationship to feature dimensionality. Additionally, we insert a batch normalization layer immediately before the output layer, effectively stabilizing the feature space. Building on these coefficients and a suite of large-scale experiments, we develop an empirical formula to estimate $T^*$ without additional training while also introducing a corrective scheme to refine $T^*$ based on the number of classes and task complexity. Our findings confirm that the derived temperature not only aligns with the proposed theoretical perspective but also generalizes effectively across diverse tasks, consistently enhancing classification performance and offering a practical, training-free solution for determining $T^*$.

Problem

Research questions and friction points this paper is trying to address.

Determining optimal softmax temperature without training

Addressing temperature fluctuations across models and datasets

Stabilizing feature space for robust classification performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal temperature derived from feature dimensionality

Batch normalization stabilizes feature space

Training-free empirical formula estimates temperature

🔎 Similar Papers

Softmax is not Enough (for Sharp Size Generalisation)