Theoretical Limitations of Ensembles in the Age of Overparameterization

📅 2024-10-21

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

207K/year

🤖 AI Summary

While classical underparameterized ensembles improve generalization, modern overparameterized neural network ensembles often exhibit no such benefit—yet the underlying mechanism remains unclear. Method: Leveraging random feature regression, we rigorously prove that infinitely wide overparameterized ensembles are pointwise equivalent to a single infinitely wide model. By modeling with both ridgeless and small-ridge regression, we decompose generalization error and prediction variance. Contribution: We provide the first theoretical demonstration that overparameterized ensembles achieve generalization performance nearly identical to that of a single large model. Crucially, we show that prediction variance primarily reflects increased model capacity—not epistemic uncertainty—thereby challenging the long-standing heuristic that “ensembles must outperform individual models.” Our analysis reveals that in overparameterized regimes, ensemble averaging fails to reduce variance meaningfully, as the constituent models become highly correlated in their predictions. This fundamentally revises conventional wisdom on ensemble benefits in deep learning.

Technology Category

Application Category

📝 Abstract

Classic tree-based ensembles generalize better than any single decision tree. In contrast, recent empirical studies find that modern ensembles of (overparameterized) neural networks may not provide any inherent generalization advantage over single but larger neural networks. This paper clarifies how modern overparameterized ensembles differ from their classic underparameterized counterparts, using ensembles of random feature (RF) regressors as a basis for developing theory. In contrast to the underparameterized regime, where ensembling typically induces regularization and increases generalization, we prove that infinite ensembles of overparameterized RF regressors become pointwise equivalent to (single) infinite-width RF regressors. This equivalence, which is exact for ridgeless models and approximate for small ridge penalties, implies that overparameterized ensembles and single large models exhibit nearly identical generalization. As a consequence, we can characterize the predictive variance amongst ensemble members, and demonstrate that it quantifies the expected effects of increasing capacity rather than capturing any conventional notion of uncertainty. Our results challenge common assumptions about the advantages of ensembles in overparameterized settings, prompting a reconsideration of how well intuitions from underparameterized ensembles transfer to deep ensembles and the overparameterized regime.

Problem

Research questions and friction points this paper is trying to address.

Clarify differences between modern and classic ensembles

Prove equivalence of overparameterized ensembles to single models

Challenge assumptions about ensemble advantages in overparameterization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Overparameterized ensembles equal single infinite-width models

Finite ensembles converge to single models rapidly

Predictive variance reflects capacity, not uncertainty

🔎 Similar Papers

(Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models