Sampling Imbalanced Data with Multi-objective Bilevel Optimization

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address poor minority-class recognition performance caused by class imbalance in binary classification, this paper proposes MOODS, a multi-objective bilevel optimization framework that jointly optimizes synthetic oversampling (e.g., SMOTE-based extensions) and majority-class undersampling. We introduce a novel ε/δ non-overlapping diversity validation metric—the first to quantitatively assess how resampling strategies affect model generalization. MOODS explicitly models intra-class and inter-class diversity, thereby mitigating overfitting commonly induced by conventional reweighting or single-stage resampling. Evaluated on multiple benchmark imbalanced datasets, MOODS achieves state-of-the-art performance: F1 scores improve by 1–15% over existing methods. Crucially, empirical analysis reveals a statistically significant positive correlation between diversity enhancement and performance gain. This work establishes a new, interpretable, and optimization-friendly paradigm for imbalanced learning.

Technology Category

Application Category

📝 Abstract

Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints resulting in poor classification of the minority class in particular. Traditional approaches, such as reweighting the loss function or na""ive resampling, risk overfitting and subsequently fail to improve classification because they do not consider the diversity between majority and minority datasets. Such consideration is infeasible because there is no metric that can measure the impact of imbalance on the model. To obviate these challenges, we make two key contributions. First, we introduce MOODS~(Multi-Objective Optimization for Data Sampling), a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling. Second, we introduce a validation metric -- `$epsilon/ delta$ non-overlapping diversification metric' -- that quantifies the goodness of a sampling method towards model performance. With this metric we experimentally demonstrate state-of-the-art performance with improvement in diversity driving a $1-15 %$ increase in $F1$ scores.

Problem

Research questions and friction points this paper is trying to address.

Addresses imbalanced two-class classification problems

Proposes multi-objective optimization for data sampling

Introduces metric to evaluate sampling method effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective bilevel optimization for sampling

Novel ε/δ non-overlapping diversification metric

Combines synthetic oversampling and majority undersampling

🔎 Similar Papers

Sample Selection Bias in Machine Learning for Healthcare