GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical report generation evaluation metrics focus solely on keyword coverage, neglecting clinically critical fine-grained aspects—such as lesion location, severity, and diagnostic uncertainty—leading to incomplete reliability assessment. To address this, we propose the first multi-agent collaborative framework for trustworthy radiology report evaluation. Our method integrates anatomical- and lesion-level fine-grained parsing via named entity recognition (NER-F1) and introduces an LLM-driven subjective dimension scoring module, enabling structured feedback and clinical interpretability. Evaluated on Rexval and RadEvalX, our framework achieves Kendall correlation coefficients of 0.70 and 0.54 with expert ratings—substantially outperforming prevailing metrics. The code and interactive demo system are publicly available.

Technology Category

Application Category

📝 Abstract
Automatic medical report generation supports clinical diagnosis, reduces the workload of radiologists, and holds the promise of improving diagnosis consistency. However, existing evaluation metrics primarily assess the accuracy of key medical information coverage in generated reports compared to human-written reports, while overlooking crucial details such as the location and certainty of reported abnormalities. These limitations hinder the comprehensive assessment of the reliability of generated reports and pose risks in their selection for clinical use. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper, which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs NER-F1 calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments validate that GEMA-Score achieves the highest correlation with human expert evaluations on a public dataset, demonstrating its effectiveness in clinical scoring (Kendall coefficient = 0.70 for Rexval dataset and Kendall coefficient = 0.54 for RadEvalX dataset). The anonymous project demo is available at: https://github.com/Zhenxuan-Zhang/GEMA_score.
Problem

Research questions and friction points this paper is trying to address.

Evaluates accuracy of medical report generation
Assesses disease location, severity, and uncertainty
Improves clinical report reliability and selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent workflow for report evaluation
NER-F1 calculations assess disease details
LLM-based scoring agent provides feedback
🔎 Similar Papers
No similar papers found.
Zhenxuan Zhang
Zhenxuan Zhang
Georgia Institute of Technology
K
Kinhei Lee
Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
W
Weihang Deng
Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
Huichi Zhou
Huichi Zhou
University College London
AI4Science
Z
Zihao Jin
Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
J
Jiahao Huang
Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK; National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK
Zhifan Gao
Zhifan Gao
Sun Yat-sen University
Medical Image AnalysisComputer VisionMachine Learning
D
Dominic C. Marshall
Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
Yingying Fang
Yingying Fang
Imperial College London
G
Guang Yang
Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK; National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK; Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 6NP, UK; School of Biomedical Engineering & Imaging Sciences, King’s College London, London WC2R 2LS, UK