Lift What You Can: Green Online Learning with Heterogeneous Ensembles

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing ensemble methods for streaming data mining neglect computational cost, failing to balance predictive performance with sustainability under resource constraints. Method: This paper proposes HEROS, a heterogeneous online ensemble framework that models the performance–resource trade-off as a Markov decision process (MDP) and introduces a provably near-optimal ζ-policy for dynamically selecting low-overhead model subsets for online updates. HEROS integrates heterogeneous model pool construction, incremental ensemble learning, and stochastic-model-based policy optimization. Contribution/Results: Evaluated on 11 benchmark streaming datasets, HEROS achieves state-of-the-art (SOTA) predictive accuracy while significantly reducing computational energy consumption and memory footprint—outperforming current best methods on several key metrics. Its principled MDP formulation and ζ-policy provide theoretical guarantees on near-optimality under resource constraints, advancing sustainable real-time stream mining.

Technology Category

Application Category

📝 Abstract

Ensemble methods for stream mining necessitate managing multiple models and updating them as data distributions evolve. Considering the calls for more sustainability, established methods are however not sufficiently considerate of ensemble members' computational expenses and instead overly focus on predictive capabilities. To address these challenges and enable green online learning, we propose heterogeneous online ensembles (HEROS). For every training step, HEROS chooses a subset of models from a pool of models initialized with diverse hyperparameter choices under resource constraints to train. We introduce a Markov decision process to theoretically capture the trade-offs between predictive performance and sustainability constraints. Based on this framework, we present different policies for choosing which models to train on incoming data. Most notably, we propose the novel $ζ$-policy, which focuses on training near-optimal models at reduced costs. Using a stochastic model, we theoretically prove that our $ζ$-policy achieves near optimal performance while using fewer resources compared to the best performing policy. In our experiments across 11 benchmark datasets, we find empiric evidence that our $ζ$-policy is a strong contribution to the state-of-the-art, demonstrating highly accurate performance, in some cases even outperforming competitors, and simultaneously being much more resource-friendly.

Problem

Research questions and friction points this paper is trying to address.

Managing multiple models in stream mining with evolving data distributions

Balancing predictive performance against computational sustainability constraints

Selecting optimal model subsets under resource limitations for online learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous ensembles with diverse hyperparameter initialization

Markov decision process balancing performance and sustainability

Novel ζ-policy training near-optimal models with fewer resources

🔎 Similar Papers

Agreement-Based Cascading for Efficient Inference