A Review of Machine Learning Techniques in Imbalanced Data and Future Trends

📅 2023-10-11
🏛️ arXiv.org
📈 Citations: 10
Influential: 1
📄 PDF
🤖 AI Summary
Class imbalance severely biases model training and undermines evaluation validity across domains. Method: This paper systematically reviews 258 authoritative publications (2003–2023) and proposes the first cross-domain, multi-dimensional taxonomy for imbalance learning—unifying sampling-based methods (e.g., SMOTE, ADASYN), cost-sensitive learning, ensemble techniques (e.g., EasyEnsemble, RUSBoost), deep learning adaptations, and evaluation metrics (F1, G-mean, AUC-PR). Contribution/Results: We construct a full-stack knowledge graph spanning preprocessing, modeling, evaluation, and deployment, and introduce the first evaluation selection guideline tailored to large-scale, real-world imbalanced applications. The framework significantly lowers practical adoption barriers in high-skew domains such as financial risk control and medical diagnosis, while identifying emerging research frontiers—including self-supervised and causal learning integration.
📝 Abstract
For over two decades, detecting rare events has been a challenging task among researchers in the data mining and machine learning domain. Real-life problems inspire researchers to navigate and further improve data processing and algorithmic approaches to achieve effective and computationally efficient methods for imbalanced learning. In this paper, we have collected and reviewed 258 peer-reviewed papers from archival journals and conference papers in an attempt to provide an in-depth review of various approaches in imbalanced learning from technical and application perspectives. This work aims to provide a structured review of methods used to address the problem of imbalanced data in various domains and create a general guideline for researchers in academia or industry who want to dive into the broad field of machine learning using large-scale imbalanced data.
Problem

Research questions and friction points this paper is trying to address.

Reviewing machine learning techniques for imbalanced data
Addressing rare event detection challenges in real-world applications
Providing guidelines for handling large-scale imbalanced datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reviewing 258 papers on imbalanced learning
Analyzing technical and application perspectives
Creating guidelines for large-scale imbalanced data
🔎 Similar Papers
No similar papers found.
E
Elaheh Jafarigol
School of Industrial and Systems Engineering University of Oklahoma, 202 W. Boyd St., Room 124, Norman, Oklahoma 73019, USA
Theodore Trafalis
Theodore Trafalis
School of Industrial and Systems Engineering, The University of Oklahoma
OptimizationMachine LearningData MiningBig DataComplex Systems