🤖 AI Summary
Softmax outputs lack hard probabilistic constraints, leading to unreliable model calibration. To address this, we propose Box-Constrained Softmax (BCSoftmax), the first method to impose explicit box constraints—i.e., analytically defined upper and lower bounds—directly on the softmax output space. BCSoftmax yields a differentiable, closed-form, and theoretically grounded probabilistic constraint mechanism. Building upon it, we design two novel post-hoc calibration methods: (1) analytical calibration driven by constrained optimization, and (2) boundary-aware logit rescaling. Extensive experiments on TinyImageNet, CIFAR-100, and 20NewsGroups demonstrate that BCSoftmax significantly reduces Expected Calibration Error (ECE) by an average of 38.2% and improves Brier Score, effectively mitigating both overconfidence and underconfidence. The approach enhances model trustworthiness and deployment robustness without compromising accuracy or inference efficiency.
📝 Abstract
Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities, which can be critical in certain applications requiring reliable and trustworthy models. In this work, we propose the box-constrained softmax ($mathrm{BCSoftmax}$) function, a novel generalization of the $mathrm{Softmax}$ function that explicitly enforces lower and upper bounds on output probabilities. While $mathrm{BCSoftmax}$ is formulated as the solution to a box-constrained optimization problem, we develop an exact and efficient computation algorithm for $mathrm{BCSoftmax}$. As a key application, we introduce two post-hoc calibration methods based on $mathrm{BCSoftmax}$. The proposed methods mitigate underconfidence and overconfidence in predictive models by learning the lower and upper bounds of the output probabilities or logits after model training, thereby enhancing reliability in downstream decision-making tasks. We demonstrate the effectiveness of our methods experimentally using the TinyImageNet, CIFAR-100, and 20NewsGroups datasets, achieving improvements in calibration metrics.