🤖 AI Summary
This study systematically evaluates the effectiveness of five class imbalance handling methods—Random Under-Sampling (RUS), Random Over-Sampling (ROS), SMOTE, reweighting, and direct F1 optimization—in multimodal biomedical binary classification tasks. For the first time, it investigates the interaction between model complexity and data modality (tabular, textual, and imaging) within a unified experimental framework. The evaluation spans diverse architectures, ranging from logistic regression and random forests to MLPs, BiLSTMs, BERT, DenseNet, and DINOv2. Results demonstrate that ROS and reweighting substantially enhance performance for complex models, while direct F1 optimization achieves the best results on unstructured data. In contrast, RUS and SMOTE generally degrade performance, revealing that the efficacy of imbalance mitigation strategies is highly contingent upon the interplay between model complexity and data modality.
📝 Abstract
Objective: The primary goal of this study was to systematically examine the impact of commonly used imbalance handling methods (IHMs) on predictive performance in biomedical binary classification, considering the interplay between model complexity and diverse data modalities.
Material and Methods: We evaluated five representative IHMs: random undersampling (RUS), random oversampling (ROS), SMOTE, re-weighting (RW), and direct F1-score optimization (DMO), against a raw training (RAW) baseline. The evaluation encompassed three public biomedical datasets: MIMIC-III (tabular), ADE-Corpus-V2 (text), and MURA (image), spanning three common biomedical data modalities. To assess varying model complexity, we employed a range of architectures, from classical logistic regression and random forest to deep neural networks, including multilayer perceptron (MLP), BiLSTM, BERT, DenseNet, and DINOv2.
Results: For simpler models such as logistic regression on tabular data, IHMs yielded no significant advantage over the RAW baseline, aligning with prior findings. However, clear benefits were observed for more complex models and unstructured data: (a) ROS and RW consistently enhanced the performance of powerful models; (b) direct F1-score optimization demonstrated utility primarily for unstructured text and image data; and (c) RUS and SMOTE consistently degraded performance and are therefore not recommended.
Conclusion: The effectiveness of IHMs depends on both model complexity and data modality. Performance gains are most pronounced when leveraging appropriate IHMs, such as ROS, RW, and DMO, on high-complexity models.