🤖 AI Summary
In data stream regression, rare samples are unevenly distributed across the entire target range—not merely at extremes—causing severe distributional imbalance that degrades model performance.
Method: This paper proposes a histogram-driven online resampling framework, introducing HistUS (Histogram-based Online Undersampling) and HistOS (Histogram-based Online Oversampling). Unlike Chebyshev inequality–based approaches, which poorly localize rare instances, our methods employ dynamic binning to achieve adaptive rebalancing across the full target distribution. Integrated into a streaming architecture, the framework enables real-time adaptation to concept drift and sudden imbalance shifts.
Results: Evaluated on multiple synthetic and real-world data streams, the approach reduces average MAE by 18.7% over state-of-the-art baselines, demonstrating significant improvements in both predictive accuracy and robustness to distributional changes.
📝 Abstract
Handling imbalanced data streams in regression tasks presents a significant challenge, as rare instances can appear anywhere in the target distribution rather than being confined to its extreme values. In this paper, we introduce novel data-level sampling strategies, exttt{HistUS} and exttt{HistOS}, that utilize histogram-based approaches to dynamically balance data streams. Unlike previous methods based on Chebyshev extquotesingle s inequality, our proposed techniques identify and handle rare cases across the entire distribution effectively. We demonstrate that exttt{HistUS} and exttt{HistOS} outperform traditional methods through extensive experiments on synthetic and real-world datasets, leading to more accurate and robust regression models in streaming environments.