🤖 AI Summary
This paper addresses the rigidity of conventional argmax decision boundaries in multi-class classification by proposing the first threshold-driven classification framework tailored for multi-class settings. Methodologically, it reinterprets softmax outputs as geometric points on the probability simplex, replaces scalar thresholds with learnable multidimensional thresholds, and introduces a score-guided loss function based on stochastic threshold sampling—enabling post-hoc calibration without architectural modification. Key contributions include: (1) the first systematic generalization of binary threshold optimization to multi-class classification; (2) a plug-and-play, architecture-agnostic post-processing mechanism compatible with arbitrary pre-trained models; and (3) consistent accuracy improvements across diverse model architectures and benchmark datasets, with the proposed loss matching cross-entropy performance—demonstrating both the effectiveness and generalizability of multi-class threshold-driven optimization.
📝 Abstract
In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of softmax outputs with a geometric one on the multidimensional simplex, where the classification depends on a multidimensional threshold. This change of perspective enables for any trained classification network an a posteriori optimization of the classification score by means of threshold tuning, as usually carried out in the binary setting. This allows a further refinement of the prediction capability of any network. Moreover, this multidimensional threshold-based setting makes it possible to define score-oriented losses, which are based on the interpretation of the threshold as a random variable. Our experiments show that the multidimensional threshold tuning yields consistent performance improvements across various networks and datasets, and that the proposed multiclass score-oriented losses are competitive with standard loss functions, resembling the advantages observed in the binary case.