🤖 AI Summary
To address the lack of systematic evaluation benchmarks for Class Imbalance Learning (CIL) on tabular data, this paper introduces CLIMB—the first open-source, standardized benchmark specifically designed for tabular CIL. CLIMB comprises 73 real-world datasets and 29 uniformly implemented CIL algorithms—including resampling, cost-sensitive, ensemble, and state-of-the-art methods—accompanied by a standardized API, rigorous code quality assurance, and automated evaluation pipelines. Its key contributions are threefold: (1) it establishes the first high-quality, cross-domain benchmark spanning diverse imbalance ratios; (2) it fills a critical gap in systematic, reproducible CIL evaluation for tabular data; and (3) large-scale empirical analysis reveals the limited efficacy of naive rebalancing techniques, validates the superiority of ensemble strategies (e.g., RUSBoost, EasyEnsemble), and identifies data quality—as opposed to imbalance ratio alone—as a decisive factor governing CIL performance, thereby providing evidence-based guidance for algorithm selection.
📝 Abstract
Class-imbalanced learning (CIL) on tabular data is important in many real-world applications where the minority class holds the critical but rare outcomes. In this paper, we present CLIMB, a comprehensive benchmark for class-imbalanced learning on tabular data. CLIMB includes 73 real-world datasets across diverse domains and imbalance levels, along with unified implementations of 29 representative CIL algorithms. Built on a high-quality open-source Python package with unified API designs, detailed documentation, and rigorous code quality controls, CLIMB supports easy implementation and comparison between different CIL algorithms. Through extensive experiments, we provide practical insights on method accuracy and efficiency, highlighting the limitations of naive rebalancing, the effectiveness of ensembles, and the importance of data quality. Our code, documentation, and examples are available at https://github.com/ZhiningLiu1998/imbalanced-ensemble.