Causal Attribution of Model Performance Gaps in Medical Imaging Under Distribution Shifts

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Medical image segmentation models suffer significant performance degradation under distribution shifts, yet the underlying causal mechanisms remain poorly understood. To address this, we extend causal attribution frameworks to high-dimensional, pixel-level segmentation tasks for the first time. We construct a causal graph modeling confounding effects arising from imaging acquisition protocols and annotation discrepancies, and integrate Shapley values to enable fair and interpretable mechanistic attribution. We further propose a robust multi-center, multi-annotator evaluation paradigm and a few-shot adaptation technique supporting context-driven interventions. Evaluated on a multiple sclerosis lesion segmentation task across four clinical centers and seven annotators, our method quantitatively disentangles the independent contributions of annotation protocol (7.4% ± 8.9% DSC attribution) and imaging device (6.5% ± 9.1%) to model failure—providing actionable, causally grounded insights for enhancing segmentation robustness.

Technology Category

Application Category

📝 Abstract

Deep learning models for medical image segmentation suffer significant performance drops due to distribution shifts, but the causal mechanisms behind these drops remain poorly understood. We extend causal attribution frameworks to high-dimensional segmentation tasks, quantifying how acquisition protocols and annotation variability independently contribute to performance degradation. We model the data-generating process through a causal graph and employ Shapley values to fairly attribute performance changes to individual mechanisms. Our framework addresses unique challenges in medical imaging: high-dimensional outputs, limited samples, and complex mechanism interactions. Validation on multiple sclerosis (MS) lesion segmentation across 4 centers and 7 annotators reveals context-dependent failure modes: annotation protocol shifts dominate when crossing annotators (7.4% $pm$ 8.9% DSC attribution), while acquisition shifts dominate when crossing imaging centers (6.5% $pm$ 9.1%). This mechanism-specific quantification enables practitioners to prioritize targeted interventions based on deployment context.

Problem

Research questions and friction points this paper is trying to address.

Causal mechanisms behind performance drops in medical image segmentation due to distribution shifts are poorly understood.

The framework quantifies how acquisition protocols and annotation variability independently cause performance degradation.

It addresses high-dimensional outputs, limited samples, and complex interactions in medical imaging tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends causal attribution frameworks to high-dimensional medical segmentation tasks

Models data-generating process with causal graphs and Shapley values

Quantifies acquisition and annotation contributions to performance gaps

🔎 Similar Papers

Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift