Evaluating Automated Radiology Report Quality Through Fine-Grained Phrasal Grounding of Clinical Findings

📅 2024-12-02
🏛️ IEEE International Symposium on Biomedical Imaging
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the misalignment between clinical findings—such as location, laterality, and severity—and corresponding anatomical regions in chest X-ray images during generative AI report evaluation. To resolve this, we propose the first anatomy-grounded, multimodal report quality assessment method. Our approach integrates clinical named entity recognition, fine-grained relation extraction, cross-modal phrase-to-image grounding, and multi-source consistency scoring to achieve phrase-level localization of textual descriptions onto anatomical regions in chest radiographs and joint validation. Compared to conventional text-only metrics (e.g., BLEU, BERTScore), our method demonstrates significantly higher correlation with radiologist expert ratings on a standard ground-truth dataset (p < 0.01). It overcomes key limitations of traditional evaluation paradigms by enabling interpretable, anatomy-aware, and empirically verifiable assessment—establishing a novel benchmark for clinically trustworthy AI-assisted diagnostic reporting.

Technology Category

Application Category

📝 Abstract
Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, and severity of a large number of clinical findings. We then performed phrasal grounding to localize their associated anatomical regions on chest radiograph images. The textual and visual measures are then combined to rate the quality of the generated reports. We present results that compare this evaluation metric with other textual metrics on gold standard datasets.
Problem

Research questions and friction points this paper is trying to address.

Automated evaluation of radiology report quality
Fine-grained clinical finding pattern extraction
Combining textual and visual measures for accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts fine-grained clinical finding patterns
Performs phrasal grounding for anatomical localization
Combines textual and visual measures for evaluation
🔎 Similar Papers
No similar papers found.
R
Raziuddin Mahmood
Rensselaer Polytechnic Institute, Troy, NY USA.
Pingkun Yan
Pingkun Yan
P.K. Lashmet Chair Professor and Department Head of BME, Rensselaer Polytechnic Institute
Medical image computingAI/MLimage-guided intervention and surgical planning
D
D. M. Reyes
Rensselaer Polytechnic Institute, Troy, NY USA.
G
Ge Wang
Rensselaer Polytechnic Institute, Troy, NY USA.
M
M. Kalra
Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, USA.
P
P. Kaviani
Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, USA.
J
Joy T. Wu
IBM Research, Almaden, San Jose, CA, USA
T
Tanveer F. Syeda-Mahmood
IBM Research, Almaden, San Jose, CA, USA