On Local Posterior Structure in Deep Ensembles

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Deep ensembles of Bayesian neural networks (DE-BNNs) are widely assumed to outperform standard deep ensembles (DEs) across both in-distribution (ID) and out-of-distribution (OOD) tasks, yet their systematic performance trade-offs remain poorly understood. Method: We conduct a large-scale empirical evaluation of DE-BNNs across diverse datasets, architectures, and BNN approximation methods—including Monte Carlo Dropout, Laplace approximation, and MCMC—assessing accuracy, calibration, and OOD detection. Contribution/Results: We find that large-scale DEs consistently surpass DE-BNNs in ID accuracy and calibration, challenging the prevailing intuition that integrating Bayesian inference inherently improves ID performance. While DE-BNNs enhance OOD detection, this gain comes at a measurable cost to ID performance—an ID–OOD trade-off we are the first to rigorously quantify. We introduce a novel analytical perspective grounded in local posterior geometry and release a large, reproducible model zoo. Results hold robustly across multiple benchmarks, providing both theoretical insight and empirical guidance for ensemble design in trustworthy machine learning.

Technology Category

Application Category

📝 Abstract

Bayesian Neural Networks (BNNs) often improve model calibration and predictive uncertainty quantification compared to point estimators such as maximum-a-posteriori (MAP). Similarly, deep ensembles (DEs) are also known to improve calibration, and therefore, it is natural to hypothesize that deep ensembles of BNNs (DE-BNNs) should provide even further improvements. In this work, we systematically investigate this across a number of datasets, neural network architectures, and BNN approximation methods and surprisingly find that when the ensembles grow large enough, DEs consistently outperform DE-BNNs on in-distribution data. To shine light on this observation, we conduct several sensitivity and ablation studies. Moreover, we show that even though DE-BNNs outperform DEs on out-of-distribution metrics, this comes at the cost of decreased in-distribution performance. As a final contribution, we open-source the large pool of trained models to facilitate further research on this topic.

Problem

Research questions and friction points this paper is trying to address.

Compare performance of deep ensembles and Bayesian Neural Networks.

Investigate in-distribution and out-of-distribution performance trade-offs.

Provide open-source models for further research on ensemble methods.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Ensembles outperform Bayesian Neural Networks

Systematic investigation across datasets and architectures

Open-source trained models for further research

🔎 Similar Papers

(Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models