Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

๐Ÿ“… 2026-05-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

177K/year
๐Ÿค– AI Summary
This work addresses the challenge of structuring unstructured free-text cardiac magnetic resonance (CMR) reports by proposing CMR-EXTR, the first framework in the CMR domain to integrate confidence estimation into structured information extraction. Combining large language model distillation, uncertainty quantification, and structured extraction algorithms, the approach leverages teacherโ€“student distillation for efficient offline inference. It generates field-level confidence scores guided by three principles: distributional plausibility, sampling stability, and cross-field consistency, thereby prioritizing human review efforts. The system achieves 99.65% accuracy at the variable level, substantially reducing manual annotation burden and offering robust support for clinical cohort construction, longitudinal data integration, and automated report validation.
๐Ÿ“ Abstract
Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control. A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation. Uncertainty integrates three complementary principles -- distribution plausibility, sampling stability, and cross-field consistency -- to triage human review. Experiments show that CMR-EXTR achieves 99.65% variable-level accuracy, demonstrating both reliable extraction and informative confidence scores. To our knowledge, this is the first CMR-specific extraction system with integrated confidence estimation. The code is available at https://github.com/yuyi1005/CMR-EXTR.
Problem

Research questions and friction points this paper is trying to address.

structured data extraction
cardiac magnetic resonance
uncertainty estimation
clinical reports
data curation
Innovation

Methods, ideas, or system contributions that make the work stand out.

distilled LLMs
uncertainty estimation
structured data extraction
CMR reports
teacher-student distillation