Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

📅 2023-05-25

🏛️ International Conference on Machine Learning

📈 Citations: 4

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This paper addresses the lack of theoretical guarantees for stacking generalization in temporal probabilistic forecasting. We propose a quantile–time–item adaptive weighted ensemble method. First, we establish a tight generalization error bound for cross-validation-driven stacked ensembles—improving upon Van der Laan et al. (2007)—via empirical process theory and concentration inequalities. Second, our method introduces a structured weight modeling family that enables quantile-aware dynamic weight learning, ensuring both theoretical rigor and practical flexibility. Third, extensive experiments on multiple probabilistic forecasting benchmarks demonstrate significant improvements over conventional ensemble baselines. Empirical results confirm that dynamic weight adaptation is critical for enhancing both predictive accuracy and calibration.

📝 Abstract

Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of"stacked generalization,"namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform"much worse"than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

Proving theoretical guarantees for stacked generalization in ensemble learning

Extending and strengthening existing theoretical results on ensemble performance

Proposing a novel ensemble method for probabilistic time series forecasting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves theoretical guarantees for stacked generalization selection

Introduces ensemble weight sensitivity across items and time

Demonstrates performance gains in probabilistic forecasting applications

🔎 Similar Papers

Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach