A Survey on Small Sample Imbalance Problem: Metrics, Feature Analysis, and Solutions

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

The few-shot and imbalanced (S&I) learning problem suffers from severe generalization degradation and low interpretability due to scarce samples, extreme class imbalance, and ambiguous inter-class feature distributions. This paper proposes the first systematic analytical framework tailored to S&I learning, advocating that quantitative characterization of data properties—such as imbalance ratio and geometric complexity—must precede algorithmic design. The framework unifies multi-dimensional imbalance metrics, data complexity analysis, resampling strategies, classifier adaptation mechanisms, and an interpretable evaluation benchmark. Empirical evaluation on binary and multi-class extreme imbalance benchmarks reveals that classifier selection exerts significantly greater impact on performance than resampling improvements—exposing a fundamental flaw in prevailing heuristic-driven approaches. Our work establishes a theory-guided analytical paradigm and practical design principles for S&I learning, advancing both methodological rigor and empirical reproducibility.

Technology Category

Application Category

📝 Abstract

The small sample imbalance (S&I) problem is a major challenge in machine learning and data analysis. It is characterized by a small number of samples and an imbalanced class distribution, which leads to poor model performance. In addition, indistinct inter-class feature distributions further complicate classification tasks. Existing methods often rely on algorithmic heuristics without sufficiently analyzing the underlying data characteristics. We argue that a detailed analysis from the data perspective is essential before developing an appropriate solution. Therefore, this paper proposes a systematic analytical framework for the S&I problem. We first summarize imbalance metrics and complexity analysis methods, highlighting the need for interpretable benchmarks to characterize S&I problems. Second, we review recent solutions for conventional, complexity-based, and extreme S&I problems, revealing methodological differences in handling various data distributions. Our summary finds that resampling remains a widely adopted solution. However, we conduct experiments on binary and multiclass datasets, revealing that classifier performance differences significantly exceed the improvements achieved through resampling. Finally, this paper highlights open questions and discusses future trends.

Problem

Research questions and friction points this paper is trying to address.

Addressing poor model performance in small imbalanced datasets

Analyzing indistinct inter-class feature distributions for classification

Evaluating effectiveness of resampling versus classifier improvements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic analytical framework for S&I problem

Interpretable benchmarks for S&I characterization

Resampling as a widely adopted solution

🔎 Similar Papers

Sample Selection Bias in Machine Learning for Healthcare