π€ AI Summary
This work addresses the scarcity of high-quality, large-scale multimodal datasets that jointly support understanding and generation in medical image editingβa key bottleneck hindering the advancement of generative models in this domain. To bridge this gap, the study introduces the first systematic categorization of medical image editing tasks into three types: perception, modification, and transformation. Building upon this framework, the authors construct MieDB-100k, a dataset comprising 100,000 samples generated via modality-specific expert models and rule-driven synthesis, followed by rigorous human validation to ensure clinical fidelity and diversity. Models trained on MieDB-100k demonstrate significantly superior performance and generalization compared to existing open- and closed-source alternatives, establishing a robust foundation for future research in medical image editing.
π Abstract
The scarcity of high-quality data remains a primary bottleneck in adapting multimodal generative models for medical image editing. Existing medical image editing datasets often suffer from limited diversity, neglect of medical image understanding and inability to balance quality with scalability. To address these gaps, we propose MieDB-100k, a large-scale, high-quality and diverse dataset for text-guided medical image editing. It categorizes editing tasks into perspectives of Perception, Modification and Transformation, considering both understanding and generation abilities. We construct MieDB-100k via a data curation pipeline leveraging both modality-specific expert models and rule-based data synthetic methods, followed by rigorous manual inspection to ensure clinical fidelity. Extensive experiments demonstrate that model trained with MieDB-100k consistently outperform both open-source and proprietary models while exhibiting strong generalization ability. We anticipate that this dataset will serve as a cornerstone for future advancements in specialized medical image editing.