Eyes Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical Imaging

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Generative AI for medical imaging suffers from insufficient clinical authenticity: existing computational evaluation metrics are misaligned with radiologists’ diagnostic reasoning, yielding synthetically realistic yet diagnostically untrustworthy images. Method: We propose GazeVal, the first evaluation framework integrating radiologists’ eye-tracking trajectories with double-blind interpretation—combining diagnostic testing and a Turing test—to establish a novel, cognition-informed realism assessment paradigm. Our approach unifies eye-movement recording, multimodal quality modeling, and expert behavioral analysis. Contribution/Results: In experiments with 16 board-certified radiologists, GazeVal identified 96.6% of state-of-the-art generated images as synthetic, exposing critical deficits in clinical fidelity. This work bridges the gap between generative quality assessment and real-world clinical requirements, delivering an interpretable, empirically grounded, and clinically validated benchmark for trustworthy medical AI.

Technology Category

Application Category

📝 Abstract

The demand for high-quality synthetic data for model training and augmentation has never been greater in medical imaging. However, current evaluations predominantly rely on computational metrics that fail to align with human expert recognition. This leads to synthetic images that may appear realistic numerically but lack clinical authenticity, posing significant challenges in ensuring the reliability and effectiveness of AI-driven medical tools. To address this gap, we introduce GazeVal, a practical framework that synergizes expert eye-tracking data with direct radiological evaluations to assess the quality of synthetic medical images. GazeVal leverages gaze patterns of radiologists as they provide a deeper understanding of how experts perceive and interact with synthetic data in different tasks (i.e., diagnostic or Turing tests). Experiments with sixteen radiologists revealed that 96.6% of the generated images (by the most recent state-of-the-art AI algorithm) were identified as fake, demonstrating the limitations of generative AI in producing clinically accurate images.

Problem

Research questions and friction points this paper is trying to address.

Evaluating synthetic medical images lacks clinical authenticity

Current metrics misalign with expert human recognition

Generative AI struggles to produce clinically accurate images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines eye-tracking with radiological evaluations

Uses gaze patterns to assess synthetic images

Reveals AI limitations via expert perception

🔎 Similar Papers

No similar papers found.