Rebalancing with Calibrated Sub-classes (RCS): An Enhanced Approach for Robust Imbalanced Classification

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

To address classifier bias toward majority classes induced by class imbalance, this paper proposes a distribution-calibration-based synthetic sample generation method. The core innovation lies in estimating minority-class distribution parameters via a weighted Gaussian mixture model fitted on data from the proximity regions between majority and intermediate classes, while preserving semantic structure through an encoder-decoder network to generate high-fidelity synthetic samples; this strategy effectively mitigates overgeneralization of minority classes arising from sole reliance on majority-class modeling. Extensive experiments across multimodal datasets—including image, text, and tabular domains—demonstrate that the proposed method significantly outperforms mainstream baselines such as SMOTE, ADASYN, and CTGAN, achieving state-of-the-art performance in key metrics including F1-score and G-mean, with enhanced classification accuracy and robustness.

Technology Category

Application Category

📝 Abstract

The class imbalance problem refers to the insufficiency of data in certain classes, which causes a classifier to be biased toward the majority class. Distribution calibration is a technique that seeks to estimate a more accurate class distribution based on an observed or estimated one. To address this issue, we propose a distribution calibration-based method-Rebalancing with Calibrated Sub-classes (RCS): An Enhanced Approach for Robust Imbalanced Classification, which estimates the distribution parameters of the minority classes using weighted parameters derived from a mixture of Gaussian components from both the majority and intermediate classes. An encoder-decoder network is trained to preserve the structure of the imbalanced data and prevent disentanglement. After training, feature vectors extracted from the encoder are used to generate synthetic samples through our distribution calibration strategy. This approach effectively mitigates the overgeneralization problem that arises when only the distribution of the majority class is used to approximate the minority class statistics. Instead, our method calibrates the parameters by leveraging the distribution of data points in neighboring regions. Experimental results demonstrate that the proposed method achieves superior classification performance compared to several baseline and state-of-the-art techniques across a diverse range of image, text, and tabular datasets.

Problem

Research questions and friction points this paper is trying to address.

Addresses classifier bias from insufficient minority class data

Mitigates overgeneralization by leveraging neighboring class distributions

Generates synthetic samples through calibrated distribution parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian mixture models for distribution calibration

Employs encoder-decoder network to preserve data structure

Generates synthetic samples from calibrated feature vectors

🔎 Similar Papers

No similar papers found.