Minimax Generalized Cross-Entropy

πŸ“… 2026-03-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of the existing Generalized Cross-Entropy (GCE) loss, which suffers from non-convexity on complex datasets, leading to underfitting and a trade-off between robustness and optimization efficiency. To overcome this, we propose the Minimax Generalized Cross-Entropy (MGCE) loss, which reformulates GCE as a convex minimax problem with respect to the classification margin. MGCE simultaneously ensures robustness to label noise and enables efficient optimization, while providing a theoretical upper bound on classification error. Leveraging a bilevel convex optimization framework and implicit differentiation, our method supports scalable stochastic gradient training. Experimental results demonstrate that MGCE consistently improves accuracy, convergence speed, and prediction calibration over standard and noisy benchmarks.

Technology Category

Application Category

πŸ“ Abstract
Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.
Problem

Research questions and friction points this paper is trying to address.

generalized cross-entropy
non-convex optimization
underfitting
classification margins
label noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

minimax optimization
generalized cross-entropy
convex optimization
label noise robustness
implicit differentiation
πŸ”Ž Similar Papers
No similar papers found.