MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering

📅 2026-03-07

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenge of precisely manipulating specific pathological concepts in medical image generation while preserving anatomical structure—a limitation that stems from the scarcity of causal training data. The authors propose a training-free activation-guidance framework that identifies and leverages pathology-specific vectors within the cross-attention layers of a diffusion transformer. During inference, activations are steered along these directions to generate endoscopic counterfactual image pairs in which only the target pathology is altered while anatomy remains intact. This approach enables precise clinical concept editing without model retraining or image reconstruction. Evaluated on Kvasir-v3 and HyperKvasir, the method achieves pathology flipping rates of 0.800–0.950 and a dye removal rate of 75%, substantially outperforming baselines, and improves the AUC of a downstream polyp-detection Vision Transformer to 0.9755.

Technology Category

Application Category

📝 Abstract

Generative diffusion models are increasingly used for medical imaging data augmentation, but text prompting cannot produce causal training data. Re-prompting rerolls the entire generation trajectory, altering anatomy, texture, and background. Inversion-based editing methods introduce reconstruction error that causes structural drift. We propose MedSteer, a training-free activation-steering framework for endoscopic synthesis. MedSteer identifies a pathology vector for each contrastive prompt pair in the cross-attention layers of a diffusion transformer. At inference time, it steers image activations along this vector, generating counterfactual pairs from scratch where the only difference is the steered concept. All other structure is preserved by construction. We evaluate MedSteer across three experiments on Kvasir v3 and HyperKvasir. On counterfactual generation across three clinical concept pairs, MedSteer achieves flip rates of 0.800, 0.925, and 0.950, outperforming the best inversion-based baseline in both concept flip rate and structural preservation. On dye disentanglement, MedSteer achieves 75% dye removal against 20% (PnP) and 10% (h-Edit). On downstream polyp detection, augmenting with MedSteer counterfactual pairs achieves ViT AUC of 0.9755 versus 0.9083 for quantity-matched re-prompting, confirming that counterfactual structure drives the gain. Code is at link https://github.com/phamtrongthang123/medsteer

Problem

Research questions and friction points this paper is trying to address.

counterfactual generation

medical image editing

diffusion models

structural preservation

endoscopic synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

activation steering

counterfactual generation

training-free editing