Theoretical Limitations of Ensembles in the Age of Overparameterization

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
While classical underparameterized ensembles improve generalization, modern overparameterized neural network ensembles often exhibit no such benefit—yet the underlying mechanism remains unclear. Method: Leveraging random feature regression, we rigorously prove that infinitely wide overparameterized ensembles are pointwise equivalent to a single infinitely wide model. By modeling with both ridgeless and small-ridge regression, we decompose generalization error and prediction variance. Contribution: We provide the first theoretical demonstration that overparameterized ensembles achieve generalization performance nearly identical to that of a single large model. Crucially, we show that prediction variance primarily reflects increased model capacity—not epistemic uncertainty—thereby challenging the long-standing heuristic that “ensembles must outperform individual models.” Our analysis reveals that in overparameterized regimes, ensemble averaging fails to reduce variance meaningfully, as the constituent models become highly correlated in their predictions. This fundamentally revises conventional wisdom on ensemble benefits in deep learning.

Technology Category

Application Category

📝 Abstract
Classic tree-based ensembles generalize better than any single decision tree. In contrast, recent empirical studies find that modern ensembles of (overparameterized) neural networks may not provide any inherent generalization advantage over single but larger neural networks. This paper clarifies how modern overparameterized ensembles differ from their classic underparameterized counterparts, using ensembles of random feature (RF) regressors as a basis for developing theory. In contrast to the underparameterized regime, where ensembling typically induces regularization and increases generalization, we prove that infinite ensembles of overparameterized RF regressors become pointwise equivalent to (single) infinite-width RF regressors. This equivalence, which is exact for ridgeless models and approximate for small ridge penalties, implies that overparameterized ensembles and single large models exhibit nearly identical generalization. As a consequence, we can characterize the predictive variance amongst ensemble members, and demonstrate that it quantifies the expected effects of increasing capacity rather than capturing any conventional notion of uncertainty. Our results challenge common assumptions about the advantages of ensembles in overparameterized settings, prompting a reconsideration of how well intuitions from underparameterized ensembles transfer to deep ensembles and the overparameterized regime.
Problem

Research questions and friction points this paper is trying to address.

Clarify differences between modern and classic ensembles
Prove equivalence of overparameterized ensembles to single models
Challenge assumptions about ensemble advantages in overparameterization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Overparameterized ensembles equal single infinite-width models
Finite ensembles converge to single models rapidly
Predictive variance reflects capacity, not uncertainty
🔎 Similar Papers
No similar papers found.
N
Niclas Dern
Technical University of Munich
J
John P. Cunningham
Columbia University, Zuckerman Institute
Geoff Pleiss
Geoff Pleiss
Assistant Professor, University of British Columbia
Machine Learning