🤖 AI Summary
Multi-class neural networks are typically trained with cross-entropy loss, yet evaluation commonly relies on confusion-matrix-based metrics such as F<sub>β</sub> and Matthews Correlation Coefficient (MCC), leading to objective–evaluation misalignment—especially under class imbalance or when users prioritize specific β (e.g., high precision). Method: We propose the first end-to-end optimization framework targeting multi-class Macro-F<sub>β</sub>. It constructs a differentiable d×d soft confusion matrix, incorporates runtime-adaptive thresholds τ, and employs a piecewise-linear Heaviside approximation to yield a gradient-descent–friendly F<sub>β</sub> surrogate. Theoretically, our surrogate is proven to be a consistent estimator of Macro-F<sub>β</sub>. Results: Extensive experiments on benchmark datasets demonstrate significant improvements in target F<sub>β</sub> scores—particularly for β ≠ 1—outperforming both standard cross-entropy training and existing F<sub>1</sub>-oriented methods. This validates the effectiveness and generalizability of aligning training objectives with evaluation metrics.
📝 Abstract
Multiclass neural network classifiers are typically trained using cross-entropy loss. Following training, the performance of this same neural network is evaluated using an application-specific metric based on the multiclass confusion matrix, such as the Macro $F_eta$-Score. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria, particularly in scenarios where there is a need to emphasize one aspect of classifier performance. For example, if greater precision is preferred over recall, the $eta$ value in the $F_eta$ evaluation metric can be adjusted accordingly, but the cross-entropy objective remains unaware of this preference during training. We propose a method that addresses this training-evaluation gap for multiclass neural network classifiers such that users can train these models informed by the desired final $F_eta$-Score. Following prior work in binary classification, we utilize the concepts of the soft-set confusion matrices and a piecewise-linear approximation of the Heaviside step function. Our method extends the $2 imes 2$ binary soft-set confusion matrix to a multiclass $d imes d$ confusion matrix and proposes dynamic adaptation of the threshold value $ au$, which parameterizes the piecewise-linear Heaviside approximation during run-time. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_eta$ that is a consistent estimator of Macro-$F_eta$, and our extensive experiments show the practical effectiveness of our approach.