Improving Deep Ensembles by Estimating Confusion Matrices

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Deep ensembles commonly employ uniform averaging, ignoring performance disparities among constituent networks, thereby limiting accuracy, calibration, and out-of-distribution (OOD) detection capability. To address this, we propose soft Dawid-Skene (sDS) aggregation—the first adaptation of the Dawid-Skene framework to deep ensembles using soft labels. sDS employs an expectation-maximization (EM) algorithm to implicitly estimate each network’s confusion matrix, enabling performance-aware, dynamic weighting without requiring ground-truth labels. Evaluated on CIFAR and ImageNet benchmarks, sDS consistently outperforms simple averaging: it improves classification accuracy, reduces expected calibration error (ECE) by over 30%, and boosts OOD detection AUC by 5–12 percentage points. Crucially, sDS simultaneously enhances accuracy, calibration, and OOD robustness—achieving a more balanced and reliable ensemble behavior.

Technology Category

Application Category

📝 Abstract

Ensembling in deep learning improves accuracy and calibration over single networks. The traditional aggregation approach, ensemble averaging, treats all individual networks equally by averaging their outputs. Inspired by crowdsourcing we propose an aggregation method called soft Dawid Skene for deep ensembles that estimates confusion matrices of ensemble members and weighs them according to their inferred performance. Soft Dawid Skene aggregates soft labels in contrast to hard labels often used in crowdsourcing. We empirically show the superiority of soft Dawid Skene in accuracy, calibration and out of distribution detection in comparison to ensemble averaging in extensive experiments.

Problem

Research questions and friction points this paper is trying to address.

Improves deep ensemble accuracy and calibration.

Estimates confusion matrices for ensemble members.

Enhances out-of-distribution detection performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft Dawid Skene method for deep ensembles

Estimates confusion matrices for performance weighting

Aggregates soft labels, improving accuracy and calibration

🔎 Similar Papers

Dynamic Post-Hoc Neural Ensemblers

2024-10-06arXiv.orgCitations: 0

💼 Related Jobs

Software Engineer