When resampling/reweighting improves feature learning in imbalanced classification?: A toy-model study

πŸ“… 2024-09-09
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work investigates the mechanistic impact of resampling and reweighting on feature learning under class imbalance. Using analytically tractable binary and multiclass toy models in the high-dimensional limit, we combine statistical mechanical replica analysis with symmetry properties of loss functions to characterize their efficacy. We find that such techniques are not universally beneficial: when the loss function is symmetric and the data distribution satisfies certain balance conditions, omitting resampling or reweighting is provably optimal. Crucially, we introduce the first solvable multiclass analytical model for imbalanced learning, rigorously delineating the theoretical boundaries under which resampling/reweighting improves generalization. Our analysis refutes the common heuristic that β€œresampling always helps,” establishing the first interpretable, theoretically grounded criterion for their utility: significant gains in feature representation occur only under asymmetric losses or specific distributional shifts.

Technology Category

Application Category

πŸ“ Abstract
A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of the feature is taken while keeping the dataset size ratio against the feature dimension finite and the non-rigorous replica method from statistical mechanics is employed. The result shows that there exists a case in which the no resampling/reweighting situation gives the best feature learning performance irrespectively of the choice of losses or classifiers, supporting recent findings in Cao et al. (2019); Kang et al. (2019). It is also revealed that the key of the result is the symmetry of the loss and the problem setting. Inspired by this, we propose a further simplified model exhibiting the same property for the multiclass setting. These clarify when the class-wise resampling/reweighting becomes effective in imbalanced classification.
Problem

Research questions and friction points this paper is trying to address.

Investigates when resampling improves feature learning in imbalanced classification
Analyzes class-wise reweighting effects on classification performance
Identifies conditions where no resampling yields optimal results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Toy model analyzes resampling effects
Replica method from statistical mechanics used
Identifies loss symmetry as key factor
πŸ”Ž Similar Papers
No similar papers found.