S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Traditional radiology report generation suffers from linguistic redundancy, inconsistency, and fragmentation of clinical information due to reliance on predefined templates or label-based structured methods, often omitting nuanced clinical details. To address these limitations, this work introduces the first end-to-end structured report generation framework for chest X-ray interpretation. We construct MIMIC-STRUC, the first publicly available dataset explicitly modeling four clinically essential elements—disease name, anatomical location, severity level, and probability—in a unified manner. Our method employs a template-free, large language model–driven generation approach, eliminating rigid schema constraints. Furthermore, we propose S-Score, a fine-grained, clinically oriented evaluation metric grounded in radiological reasoning. Experiments demonstrate that our framework significantly outperforms both visual question answering (VQA)–based and template-based baselines in report accuracy, clinical consistency, and interpretability. S-Score achieves strong correlation with human expert assessment (r = 0.92), establishing a standardized paradigm for AI-powered radiology reporting.

Technology Category

Application Category

📝 Abstract

Radiology report generation (RRG) for diagnostic images, such as chest X-rays, plays a pivotal role in both clinical practice and AI. Traditional free-text reports suffer from redundancy and inconsistent language, complicating the extraction of critical clinical details. Structured radiology report generation (S-RRG) offers a promising solution by organizing information into standardized, concise formats. However, existing approaches often rely on classification or visual question answering (VQA) pipelines that require predefined label sets and produce only fragmented outputs. Template-based approaches, which generate reports by replacing keywords within fixed sentence patterns, further compromise expressiveness and often omit clinically important details. In this work, we present a novel approach to S-RRG that includes dataset construction, model training, and the introduction of a new evaluation framework. We first create a robust chest X-ray dataset (MIMIC-STRUC) that includes disease names, severity levels, probabilities, and anatomical locations, ensuring that the dataset is both clinically relevant and well-structured. We train an LLM-based model to generate standardized, high-quality reports. To assess the generated reports, we propose a specialized evaluation metric (S-Score) that not only measures disease prediction accuracy but also evaluates the precision of disease-specific details, thus offering a clinically meaningful metric for report quality that focuses on elements critical to clinical decision-making and demonstrates a stronger alignment with human assessments. Our approach highlights the effectiveness of structured reports and the importance of a tailored evaluation metric for S-RRG, providing a more clinically relevant measure of report quality.

Problem

Research questions and friction points this paper is trying to address.

Traditional radiology reports are redundant and inconsistent

Existing structured report methods lack expressiveness and omit details

Current evaluation metrics fail to assess clinically critical elements

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based model for standardized report generation

MIMIC-STRUC dataset with detailed clinical annotations

S-Score metric for clinically meaningful evaluation

🔎 Similar Papers

No similar papers found.