Aligning Multiclass Neural Network Classifier Criterion with Task Performance via Fβ-Score

📅 2024-05-31
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-class neural networks are typically trained with cross-entropy loss, yet evaluation commonly relies on confusion-matrix-based metrics such as F<sub>β</sub> and Matthews Correlation Coefficient (MCC), leading to objective–evaluation misalignment—especially under class imbalance or when users prioritize specific β (e.g., high precision). Method: We propose the first end-to-end optimization framework targeting multi-class Macro-F<sub>β</sub>. It constructs a differentiable d×d soft confusion matrix, incorporates runtime-adaptive thresholds τ, and employs a piecewise-linear Heaviside approximation to yield a gradient-descent–friendly F<sub>β</sub> surrogate. Theoretically, our surrogate is proven to be a consistent estimator of Macro-F<sub>β</sub>. Results: Extensive experiments on benchmark datasets demonstrate significant improvements in target F<sub>β</sub> scores—particularly for β ≠ 1—outperforming both standard cross-entropy training and existing F<sub>1</sub>-oriented methods. This validates the effectiveness and generalizability of aligning training objectives with evaluation metrics.

Technology Category

Application Category

📝 Abstract
Multiclass neural network classifiers are typically trained using cross-entropy loss. Following training, the performance of this same neural network is evaluated using an application-specific metric based on the multiclass confusion matrix, such as the Macro $F_eta$-Score. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria, particularly in scenarios where there is a need to emphasize one aspect of classifier performance. For example, if greater precision is preferred over recall, the $eta$ value in the $F_eta$ evaluation metric can be adjusted accordingly, but the cross-entropy objective remains unaware of this preference during training. We propose a method that addresses this training-evaluation gap for multiclass neural network classifiers such that users can train these models informed by the desired final $F_eta$-Score. Following prior work in binary classification, we utilize the concepts of the soft-set confusion matrices and a piecewise-linear approximation of the Heaviside step function. Our method extends the $2 imes 2$ binary soft-set confusion matrix to a multiclass $d imes d$ confusion matrix and proposes dynamic adaptation of the threshold value $ au$, which parameterizes the piecewise-linear Heaviside approximation during run-time. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_eta$ that is a consistent estimator of Macro-$F_eta$, and our extensive experiments show the practical effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Mismatch between cross-entropy training and evaluation metrics
Suboptimal performance due to class imbalance and metric preferences
Need for alignment between classifier predictions and target metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic thresholding approach during training
Multiclass soft-set confusion matrix
Annealing process aligning surrogate loss
🔎 Similar Papers
No similar papers found.