🤖 AI Summary
Existing ensemble methods for streaming data mining neglect computational cost, failing to balance predictive performance with sustainability under resource constraints.
Method: This paper proposes HEROS, a heterogeneous online ensemble framework that models the performance–resource trade-off as a Markov decision process (MDP) and introduces a provably near-optimal ζ-policy for dynamically selecting low-overhead model subsets for online updates. HEROS integrates heterogeneous model pool construction, incremental ensemble learning, and stochastic-model-based policy optimization.
Contribution/Results: Evaluated on 11 benchmark streaming datasets, HEROS achieves state-of-the-art (SOTA) predictive accuracy while significantly reducing computational energy consumption and memory footprint—outperforming current best methods on several key metrics. Its principled MDP formulation and ζ-policy provide theoretical guarantees on near-optimality under resource constraints, advancing sustainable real-time stream mining.
📝 Abstract
Ensemble methods for stream mining necessitate managing multiple models and updating them as data distributions evolve. Considering the calls for more sustainability, established methods are however not sufficiently considerate of ensemble members' computational expenses and instead overly focus on predictive capabilities. To address these challenges and enable green online learning, we propose heterogeneous online ensembles (HEROS). For every training step, HEROS chooses a subset of models from a pool of models initialized with diverse hyperparameter choices under resource constraints to train. We introduce a Markov decision process to theoretically capture the trade-offs between predictive performance and sustainability constraints. Based on this framework, we present different policies for choosing which models to train on incoming data. Most notably, we propose the novel $ζ$-policy, which focuses on training near-optimal models at reduced costs. Using a stochastic model, we theoretically prove that our $ζ$-policy achieves near optimal performance while using fewer resources compared to the best performing policy. In our experiments across 11 benchmark datasets, we find empiric evidence that our $ζ$-policy is a strong contribution to the state-of-the-art, demonstrating highly accurate performance, in some cases even outperforming competitors, and simultaneously being much more resource-friendly.