CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current automated evaluation metrics for radiology reports suffer from insufficient granularity and poor interpretability, failing to capture clinically meaningful nuances. To address this, we propose a clinical-driven, tabular evaluation framework that assesses report quality at the attribute level across six dimensions: lesion presence and five key clinical attributes—onset, change, severity, anatomical localization, and clinical recommendation—enabling multi-dimensional alignment. We introduce CLEAR-Bench, the first expert-annotated benchmark curated by consensus among five board-certified radiologists. Our framework integrates rule- and model-based attribute extraction, knowledge-guided structured comparison, and a multi-attribute weighted consistency scoring mechanism. On CLEAR-Bench, our automated evaluation achieves a Pearson correlation of 0.89 with physician ratings—significantly outperforming conventional text-similarity metrics—and delivers both high clinical fidelity and strong interpretability.

Technology Category

Application Category

📝 Abstract
Existing metrics often lack the granularity and interpretability to capture nuanced clinical differences between candidate and ground-truth radiology reports, resulting in suboptimal evaluation. We introduce a Clinically-grounded tabular framework with Expert-curated labels and Attribute-level comparison for Radiology report evaluation (CLEAR). CLEAR not only examines whether a report can accurately identify the presence or absence of medical conditions, but also assesses whether it can precisely describe each positively identified condition across five key attributes: first occurrence, change, severity, descriptive location, and recommendation. Compared to prior works, CLEAR's multi-dimensional, attribute-level outputs enable a more comprehensive and clinically interpretable evaluation of report quality. Additionally, to measure the clinical alignment of CLEAR, we collaborate with five board-certified radiologists to develop CLEAR-Bench, a dataset of 100 chest X-ray reports from MIMIC-CXR, annotated across 6 curated attributes and 13 CheXpert conditions. Our experiments show that CLEAR achieves high accuracy in extracting clinical attributes and provides automated metrics that are strongly aligned with clinical judgment.
Problem

Research questions and friction points this paper is trying to address.

Lack of granularity in evaluating radiology report differences
Need for clinically interpretable multi-dimensional report assessment
Insufficient alignment between automated metrics and clinical judgment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinically-grounded tabular framework for evaluation
Multi-dimensional attribute-level comparison outputs
Automated metrics aligned with clinical judgment
🔎 Similar Papers
No similar papers found.
Yuyang Jiang
Yuyang Jiang
HKUST/CAA
Art therapyArt educationArt metaverse
Chacha Chen
Chacha Chen
University of Chicago
Human-centered ML
Shengyuan Wang
Shengyuan Wang
Tsinghua University
F
Feng Li
University of Chicago
Z
Zecong Tang
Zhejiang University
B
Benjamin M. Mervak
University of Michigan
L
Lydia Chelala
University of Chicago
C
Christopher M Straus
University of Chicago
R
Reve Chahine
University of Michigan
S
Samuel G. Armato
University of Chicago
Chenhao Tan
Chenhao Tan
University of Chicago
Human-centered AICommunication & IntelligenceScientific DiscoveryAI alignmentAI governance