PRISM: High-Resolution&Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical imaging deep learning faces challenges including spurious correlations, class imbalance, and scarcity of textual annotations. To address these, we propose a language-guided counterfactual generation framework based on Stable Diffusion, the first to enable independent, editable interventions on medical device artifacts and lesion attributes while preserving anatomical fidelity—precisely adding or removing specific clinical features. Our method integrates fine-tuned multimodal Stable Diffusion, medical vision–language alignment, fine-grained conditional control, and a counterfactual mask guidance mechanism. Evaluated on multiple public benchmarks, it reduces spurious correlation bias in downstream classifiers by over 40%, significantly enhancing model robustness and generalization. The implementation is open-sourced to facilitate clinical interpretability validation and reproducible research.

Technology Category

Application Category

📝 Abstract
Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, data imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.
Problem

Research questions and friction points this paper is trying to address.

Generates high-resolution medical image counterfactuals using language guidance.
Addresses spurious correlations and data imbalances in medical imaging datasets.
Enhances robustness of downstream classifiers for clinical deployment.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Stable Diffusion for medical imaging
Generates high-resolution, language-guided counterfactuals
Enables precise modification of image attributes
🔎 Similar Papers
No similar papers found.
A
Amar Kumar
Center for Intelligent Machines, McGill University, Montreal, Canada; MILA (Quebec AI institute), Montreal, Canada
Anita Kriz
Anita Kriz
McGill University
Mohammad Havaei
Mohammad Havaei
Google
Deep learningMachine learningComputer vision
T
T. Arbel
Center for Intelligent Machines, McGill University, Montreal, Canada; MILA (Quebec AI institute), Montreal, Canada