CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Current automated evaluation metrics for radiology reports suffer from insufficient granularity and poor interpretability, failing to capture clinically meaningful nuances. To address this, we propose a clinical-driven, tabular evaluation framework that assesses report quality at the attribute level across six dimensions: lesion presence and five key clinical attributes—onset, change, severity, anatomical localization, and clinical recommendation—enabling multi-dimensional alignment. We introduce CLEAR-Bench, the first expert-annotated benchmark curated by consensus among five board-certified radiologists. Our framework integrates rule- and model-based attribute extraction, knowledge-guided structured comparison, and a multi-attribute weighted consistency scoring mechanism. On CLEAR-Bench, our automated evaluation achieves a Pearson correlation of 0.89 with physician ratings—significantly outperforming conventional text-similarity metrics—and delivers both high clinical fidelity and strong interpretability.

Technology Category

Application Category

📝 Abstract

Existing metrics often lack the granularity and interpretability to capture nuanced clinical differences between candidate and ground-truth radiology reports, resulting in suboptimal evaluation. We introduce a Clinically-grounded tabular framework with Expert-curated labels and Attribute-level comparison for Radiology report evaluation (CLEAR). CLEAR not only examines whether a report can accurately identify the presence or absence of medical conditions, but also assesses whether it can precisely describe each positively identified condition across five key attributes: first occurrence, change, severity, descriptive location, and recommendation. Compared to prior works, CLEAR's multi-dimensional, attribute-level outputs enable a more comprehensive and clinically interpretable evaluation of report quality. Additionally, to measure the clinical alignment of CLEAR, we collaborate with five board-certified radiologists to develop CLEAR-Bench, a dataset of 100 chest X-ray reports from MIMIC-CXR, annotated across 6 curated attributes and 13 CheXpert conditions. Our experiments show that CLEAR achieves high accuracy in extracting clinical attributes and provides automated metrics that are strongly aligned with clinical judgment.

Problem

Research questions and friction points this paper is trying to address.

Lack of granularity in evaluating radiology report differences

Need for clinically interpretable multi-dimensional report assessment

Insufficient alignment between automated metrics and clinical judgment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinically-grounded tabular framework for evaluation

Multi-dimensional attribute-level comparison outputs

Automated metrics aligned with clinical judgment

🔎 Similar Papers

GREEN: Generative Radiology Report Evaluation and Error Notation