An Analytical Model for Overparameterized Learning Under Class Imbalance

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work investigates the generalization behavior of linear classification under class imbalance in high-dimensional Gaussian mixture models within the overparameterized regime. We develop the first rigorous, analytically tractable closed-form approximation of the test error, derived via high-dimensional asymptotic analysis and random matrix theory. Our framework unifies the bias-correction mechanisms and delineates the precise applicability boundaries of calibration strategies—including logit adjustment and class-dependent temperature scaling. The theoretical analysis yields explicit analytical expressions for the optimal adjustment bias and temperature, revealing how these corrections mitigate the systematic bias of standard cross-entropy loss in imbalanced settings. Extensive validation on synthetic data and real-world imbalanced benchmarks (CIFAR-10, MNIST, Fashion-MNIST) demonstrates that our error approximation achieves absolute prediction errors below 2%, significantly outperforming existing empirical tuning approaches.

Technology Category

Application Category

📝 Abstract

We study class-imbalanced linear classification in a high-dimensional Gaussian mixture model. We develop a tight, closed form approximation for the test error of several practical learning methods, including logit adjustment and class dependent temperature. Our approximation allows us to analytically tune and compare these methods, highlighting how and when they overcome the pitfalls of standard cross-entropy minimization. We test our theoretical findings on simulated data and imbalanced CIFAR10, MNIST and FashionMNIST datasets.

Problem

Research questions and friction points this paper is trying to address.

Class-imbalanced linear classification in high-dimensional Gaussian mixture model.

Closed-form approximation for test error of practical learning methods.

Analytical tuning and comparison of methods to overcome cross-entropy pitfalls.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed form approximation for test error

Analytical tuning of learning methods

Comparison of methods overcoming cross-entropy pitfalls

🔎 Similar Papers

Restoring balance: principled under/oversampling of data for optimal classification