Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
This work addresses the limitation of class-level evaluation metrics, which often obscure performance disparities among intra-class sub-concepts—particularly when classes are imbalanced and sub-concept distributions are skewed, leading to biased assessments. To mitigate this issue without requiring ground-truth sub-concept labels, the authors propose a utility-weighted evaluation framework that constructs uncertainty-aware soft weights from the posterior probabilities of a multi-class sub-concept model and introduces the prediction-weighted balanced accuracy (pBA). This approach enables, for the first time, a stable and interpretable evaluation grounded solely in predicted probabilities. Empirical results across tabular, medical imaging, and textual datasets demonstrate that conventional unweighted metrics can be misleading under intra-class heterogeneity, whereas pBA provides a more reliable performance measure under non-pathological, imbalanced sub-concept distributions.
📝 Abstract
Class-level evaluation can conceal substantial performance disparities across subconcepts within the same class, causing models that perform well on average to fail on specific subpopulations. Prior work has shown that common evaluation measures for imbalanced classification are biased toward larger minority subconcepts and that utility-based reweighting using true subconcept labels can mitigate this bias; however, such labels are rarely available at test time. We introduce a practical utility-weighted evaluation that replaces unavailable subconcept labels with predicted posterior probabilities from a multiclass subconcept model. Evaluation weights are defined as the expected utility under this posterior, yielding a soft, uncertainty-aware metric we call predicted-weighted balanced accuracy (pBA). Experiments on tabular benchmarks as well as medical-imaging and text datasets show that unweighted scores can be misleading under within-class heterogeneity, while pBA provides more stable and interpretable assessments when subconcept distributions are uneven but not pathological. Our code is available at: https://anonymous.4open.science/r/correcting-bias-imbalance-9C6C/.
Problem

Research questions and friction points this paper is trying to address.

imbalanced classification
performance estimation bias
minority subconcepts
evaluation metrics
within-class heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

imbalanced classification
subconcepts
utility-weighted evaluation
posterior probability
balanced accuracy
🔎 Similar Papers
No similar papers found.