MedEBench: Revisiting Text-instructed Image Editing on Medical Domain

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-guided medical image editing lacks standardized evaluation benchmarks and clinical relevance. To address this, we introduce MedEBench—the first clinically oriented 3D evaluation benchmark—comprising 1,182 real-world clinical image–text prompt–ROI triplets spanning 13 anatomical regions and 70 editing tasks. We propose a clinical-driven 3D evaluation framework assessing editing accuracy, contextual preservation, and visual fidelity, and pioneer an attention-map- and ROI-IoU-based localization failure analysis protocol. Through multi-dimensional human–machine collaborative assessment, attention visualization, and quantitative IoU analysis, we systematically evaluate seven state-of-the-art models, uncovering prevalent failure modes—including localization deviation, anatomical inconsistency, and artifact introduction. MedEBench establishes a reproducible, trustworthy foundation for rigorous, clinically grounded medical image editing research and provides diagnostic tools for model evaluation and improvement.

Technology Category

Application Category

📝 Abstract
Text-guided image editing has seen rapid progress in natural image domains, but its adaptation to medical imaging remains limited and lacks standardized evaluation. Clinically, such editing holds promise for simulating surgical outcomes, creating personalized teaching materials, and enhancing patient communication. To bridge this gap, we introduce MedEBench, a comprehensive benchmark for evaluating text-guided medical image editing. It consists of 1,182 clinically sourced image-prompt triplets spanning 70 tasks across 13 anatomical regions. MedEBench offers three key contributions: (1) a clinically relevant evaluation framework covering Editing Accuracy, Contextual Preservation, and Visual Quality, supported by detailed descriptions of expected change and ROI (Region of Interest) masks; (2) a systematic comparison of seven state-of-the-art models, revealing common failure patterns; and (3) a failure analysis protocol based on attention grounding, using IoU between attention maps and ROIs to identify mislocalization. MedEBench provides a solid foundation for developing and evaluating reliable, clinically meaningful medical image editing systems. Project website: https://mliuby.github.io/MedEBench_Website/
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized evaluation for text-guided medical image editing
Need for simulating surgical outcomes and enhancing patient communication
Absence of a comprehensive benchmark for clinical image editing tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for medical image editing
Clinically relevant evaluation framework with ROI masks
Failure analysis protocol using attention grounding
🔎 Similar Papers
No similar papers found.