Improving Deep Ensembles by Estimating Confusion Matrices

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep ensembles commonly employ uniform averaging, ignoring performance disparities among constituent networks, thereby limiting accuracy, calibration, and out-of-distribution (OOD) detection capability. To address this, we propose soft Dawid-Skene (sDS) aggregation—the first adaptation of the Dawid-Skene framework to deep ensembles using soft labels. sDS employs an expectation-maximization (EM) algorithm to implicitly estimate each network’s confusion matrix, enabling performance-aware, dynamic weighting without requiring ground-truth labels. Evaluated on CIFAR and ImageNet benchmarks, sDS consistently outperforms simple averaging: it improves classification accuracy, reduces expected calibration error (ECE) by over 30%, and boosts OOD detection AUC by 5–12 percentage points. Crucially, sDS simultaneously enhances accuracy, calibration, and OOD robustness—achieving a more balanced and reliable ensemble behavior.

Technology Category

Application Category

📝 Abstract
Ensembling in deep learning improves accuracy and calibration over single networks. The traditional aggregation approach, ensemble averaging, treats all individual networks equally by averaging their outputs. Inspired by crowdsourcing we propose an aggregation method called soft Dawid Skene for deep ensembles that estimates confusion matrices of ensemble members and weighs them according to their inferred performance. Soft Dawid Skene aggregates soft labels in contrast to hard labels often used in crowdsourcing. We empirically show the superiority of soft Dawid Skene in accuracy, calibration and out of distribution detection in comparison to ensemble averaging in extensive experiments.
Problem

Research questions and friction points this paper is trying to address.

Improves deep ensemble accuracy and calibration.
Estimates confusion matrices for ensemble members.
Enhances out-of-distribution detection performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft Dawid Skene method for deep ensembles
Estimates confusion matrices for performance weighting
Aggregates soft labels, improving accuracy and calibration
🔎 Similar Papers
No similar papers found.
D
Danil Kuzin
Lancaster University
Olga Isupova
Olga Isupova
University of Oxford
machine learningBayesian methodstopic modelingsparse methodsanomaly detection
S
Steven Reece
University of Oxford
B
Brooke D Simmons
Lancaster University