Eyes Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical Imaging

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative AI for medical imaging suffers from insufficient clinical authenticity: existing computational evaluation metrics are misaligned with radiologists’ diagnostic reasoning, yielding synthetically realistic yet diagnostically untrustworthy images. Method: We propose GazeVal, the first evaluation framework integrating radiologists’ eye-tracking trajectories with double-blind interpretation—combining diagnostic testing and a Turing test—to establish a novel, cognition-informed realism assessment paradigm. Our approach unifies eye-movement recording, multimodal quality modeling, and expert behavioral analysis. Contribution/Results: In experiments with 16 board-certified radiologists, GazeVal identified 96.6% of state-of-the-art generated images as synthetic, exposing critical deficits in clinical fidelity. This work bridges the gap between generative quality assessment and real-world clinical requirements, delivering an interpretable, empirically grounded, and clinically validated benchmark for trustworthy medical AI.

Technology Category

Application Category

📝 Abstract
The demand for high-quality synthetic data for model training and augmentation has never been greater in medical imaging. However, current evaluations predominantly rely on computational metrics that fail to align with human expert recognition. This leads to synthetic images that may appear realistic numerically but lack clinical authenticity, posing significant challenges in ensuring the reliability and effectiveness of AI-driven medical tools. To address this gap, we introduce GazeVal, a practical framework that synergizes expert eye-tracking data with direct radiological evaluations to assess the quality of synthetic medical images. GazeVal leverages gaze patterns of radiologists as they provide a deeper understanding of how experts perceive and interact with synthetic data in different tasks (i.e., diagnostic or Turing tests). Experiments with sixteen radiologists revealed that 96.6% of the generated images (by the most recent state-of-the-art AI algorithm) were identified as fake, demonstrating the limitations of generative AI in producing clinically accurate images.
Problem

Research questions and friction points this paper is trying to address.

Evaluating synthetic medical images lacks clinical authenticity
Current metrics misalign with expert human recognition
Generative AI struggles to produce clinically accurate images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines eye-tracking with radiological evaluations
Uses gaze patterns to assess synthetic images
Reveals AI limitations via expert perception
🔎 Similar Papers
No similar papers found.
D
David Wong
Northwestern University
B
Bin Wang
Northwestern University
Gorkem Durak
Gorkem Durak
Northwestern University, Department of Radiology
radiologyartificial intelligence
M
M. Tliba
University of Illinois at Chicago
A
Akshay Chaudhari
Stanford University
Aladine Chetouani
Aladine Chetouani
Institut Galilée - L2TI - Multimedia Team
Image Quality AssessmentVideo AnalysisDepp LearningPattern Recognition
A
Ahmet Enis Cetin
University of Illinois at Chicago
Ç
Ç. Topel
Northwestern University
N
Nicolo Gennaro
Northwestern University
C
C. Vendrami
Northwestern University
T
Tugce Agirlar Trabzonlu
Northwestern University
Amir Ali Rahsepar
Amir Ali Rahsepar
Northwestern University
Cardiothoracic Imaging
L
Laetitia Perronne
Northwestern University
M
Matthew Antalek
Northwestern University
O
Onural Ozturk
Northwestern University
G
Gokcan Okur
Loyola University Chicago
A
Andrew C. Gordon
Northwestern University
Ayis Pyrros
Ayis Pyrros
Neuroradiology, DuPage Medical Group
Radiologymachine learning
F
Frank H. Miller
Northwestern University
Amir Borhani
Amir Borhani
Associate Professor of Radiology, Northwestern University Feinberg School of Medicine
Abdominal ImagingLiver and Pancreaticobiliary Imaging
H
Hatice Savas
Northwestern University
E
Eric Hart
Northwestern University
D
Drew Torigian
University of Pennsylvania
J
J. Udupa
University of Pennsylvania
E
Elizabeth Krupinski
Emory University
Ulas Bagci
Ulas Bagci
Northwestern University
artificial intelligencedeep learningbiomedical image analysismedical image computing