Benchmarking Uncertainty and its Disentanglement in multi-label Chest X-Ray Classification

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study addresses the critical challenge of predictive uncertainty quantification and epistemic/aleatoric uncertainty disentanglement in multi-label chest X-ray classification—a core requirement for trustworthy medical AI. We systematically evaluate 13 uncertainty estimation methods across ResNet and Vision Transformer (ViT) architectures, and—crucially—extend Evidence Deep Learning, Heteroscedastic Classification Neural Networks, and Deep Deterministic Uncertainty to the multi-label setting for the first time. Large-scale benchmarking on the MIMIC-CXR-JPG dataset reveals substantial disparities across methods in uncertainty calibration, uncertainty decomposition fidelity, and architectural compatibility. Specifically, evidential methods excel at modeling epistemic uncertainty, whereas heteroscedastic approaches exhibit greater sensitivity to aleatoric uncertainty. Our work establishes the first empirically grounded, reproducible benchmark and practical guidance for multi-label uncertainty quantification in clinical imaging AI.

Technology Category

Application Category

📝 Abstract

Reliable uncertainty quantification is crucial for trustworthy decision-making and the deployment of AI models in medical imaging. While prior work has explored the ability of neural networks to quantify predictive, epistemic, and aleatoric uncertainties using an information-theoretical approach in synthetic or well defined data settings like natural image classification, its applicability to real life medical diagnosis tasks remains underexplored. In this study, we provide an extensive uncertainty quantification benchmark for multi-label chest X-ray classification using the MIMIC-CXR-JPG dataset. We evaluate 13 uncertainty quantification methods for convolutional (ResNet) and transformer-based (Vision Transformer) architectures across a wide range of tasks. Additionally, we extend Evidential Deep Learning, HetClass NNs, and Deep Deterministic Uncertainty to the multi-label setting. Our analysis provides insights into uncertainty estimation effectiveness and the ability to disentangle epistemic and aleatoric uncertainties, revealing method- and architecture-specific strengths and limitations.

Problem

Research questions and friction points this paper is trying to address.

Benchmark uncertainty quantification in multi-label chest X-ray classification

Evaluate 13 methods for predictive, epistemic, and aleatoric uncertainties

Disentangle uncertainty types across ResNet and Vision Transformer architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark 13 uncertainty quantification methods

Extend Evidential Deep Learning multi-label

Disentangle epistemic and aleatoric uncertainties

🔎 Similar Papers

No similar papers found.