An Interpretable Local Editing Model for Counterfactual Medical Image Generation

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the limitations of existing counterfactual medical image generation methods, which often inadvertently alter non-target regions—such as demographic attributes—and lack interpretability, thereby failing to meet clinical requirements. To overcome these challenges, we propose InstructX2X, a novel model that achieves region-specific editing by modifying only disease-relevant areas while simultaneously generating guidance maps to provide interpretable visual evidence. We also introduce MIMIC-EDIT-INSTRUCTION, the first instruction-tuning dataset derived from medical visual question answering (VQA) pairs, enabling both model training and expert evaluation. Experimental results demonstrate that our approach achieves state-of-the-art performance across multiple metrics, successfully generating high-quality, spatially constrained, and interpretable counterfactual chest X-ray images.

Technology Category

Application Category

📝 Abstract

Counterfactual medical image generation have emerged as a critical tool for enhancing AI-driven systems in medical domain by answering "what-if" questions. However, existing approaches face two fundamental limitations: First, they fail to prevent unintended modifications, resulting collateral changes in demographic attributes when only disease features should be affected. Second, they lack interpretability in their editing process, which significantly limits their utility in real-world medical applications. To address these limitations, we present InstructX2X, a novel interpretable local editing model for counterfactual medical image generation featuring Region-Specific Editing. This approach restricts modifications to specific regions, effectively preventing unintended changes while simultaneously providing a Guidance Map that offers inherently interpretable visual explanations of the editing process. Additionally, we introduce MIMIC-EDIT-INSTRUCTION, a dataset for counterfactual medical image generation derived from expert-verified medical VQA pairs. Through extensive experiments, InstructX2X achieve state-of-the-art performance across all major evaluation metrics. Our model successfully generates high-quality counterfactual chest X-ray images along with interpretable explanations.

Problem

Research questions and friction points this paper is trying to address.

counterfactual medical image generation

unintended modifications

interpretability

medical image editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpretable Editing

Counterfactual Medical Image Generation

Region-Specific Editing

Guidance Map