🤖 AI Summary
Existing evaluation methods for text-to-image generation struggle to effectively assess the accurate representation of clinical semantics—such as anatomical structures and pathological features—in medical images. To address this gap, this work proposes CSEval, a novel framework that leverages large language models to automatically evaluate the clinical semantic consistency between generated images and their corresponding text prompts. By integrating semantic alignment analysis and cross-modal matching, CSEval establishes a scalable and clinically trustworthy automated evaluation system. Experimental results demonstrate that CSEval can detect clinically relevant semantic discrepancies overlooked by conventional metrics, with its assessments showing strong agreement with expert judgments. This advancement provides critical support for the safe deployment of generative models in medical applications.
📝 Abstract
Text-to-image generation has been increasingly applied in medical domains for various purposes such as data augmentation and education. Evaluating the quality and clinical reliability of these generated images is essential. However, existing methods mainly assess image realism or diversity, while failing to capture whether the generated images reflect the intended clinical semantics, such as anatomical location and pathology. In this study, we propose the Clinical Semantics Evaluator (CSEval), a framework that leverages language models to assess clinical semantic alignment between the generated images and their conditioning prompts. Our experiments show that CSEval identifies semantic inconsistencies overlooked by other metrics and correlates with expert judgment. CSEval provides a scalable and clinically meaningful complement to existing evaluation methods, supporting the safe adoption of generative models in healthcare.