Bayesian generative models can flag performance loss, bias, and out-of-distribution image content

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Bayesian generative models (e.g., VAEs) in medical imaging suffer from poor robustness to distributional shift and struggle to reliably detect out-of-distribution samples or disentangle representation bias. To address this, we propose SLUG—the first scalable Bayesian uncertainty quantification (UQ) method for VAEs that jointly leverages Laplace approximation and stochastic trace estimation. SLUG produces pixel-level UQ scores that significantly outperform encoder-predicted variances and explicitly decouple reconstruction error from demographic (e.g., racial) representation bias. Evaluated on dermoscopic images, SLUG’s UQ scores exhibit strong correlation with both reconstruction error and racial bias metrics. Moreover, SLUG accurately localizes confounding artifacts—including ink markings and rulers—thereby exposing the model’s reliance on spurious predictive shortcuts. By enabling interpretable, computationally efficient, and pixel-wise uncertainty modeling, SLUG establishes a novel paradigm for trustworthy, clinically deployable AI.

Technology Category

Application Category

📝 Abstract

Generative models are popular for medical imaging tasks such as anomaly detection, feature extraction, data visualization, or image generation. Since they are parameterized by deep learning models, they are often sensitive to distribution shifts and unreliable when applied to out-of-distribution data, creating a risk of, e.g. underrepresentation bias. This behavior can be flagged using uncertainty quantification methods for generative models, but their availability remains limited. We propose SLUG: A new UQ method for VAEs that combines recent advances in Laplace approximations with stochastic trace estimators to scale gracefully with image dimensionality. We show that our UQ score -- unlike the VAE's encoder variances -- correlates strongly with reconstruction error and racial underrepresentation bias for dermatological images. We also show how pixel-wise uncertainty can detect out-of-distribution image content such as ink, rulers, and patches, which is known to induce learning shortcuts in predictive models.

Problem

Research questions and friction points this paper is trying to address.

Detect performance loss and bias in Bayesian generative models

Address sensitivity to distribution shifts in medical imaging

Identify out-of-distribution image content causing learning shortcuts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian generative models detect performance issues

SLUG combines Laplace approximations with trace estimators

Pixel-wise uncertainty identifies out-of-distribution content

🔎 Similar Papers

Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers?