Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

162K/year
🤖 AI Summary
This work proposes a three-stage “pretraining–intermediate training–fine-tuning” paradigm to alleviate physician workload and enhance factual consistency and accuracy in automatic summarization of radiology reports. Following general-domain pretraining, the approach introduces an intermediate training phase tailored to the radiology subdomain, effectively mitigating cold-start issues and improving few-shot learning capabilities. Built upon the GatorTronT5 architecture, the model undergoes intermediate training on large-scale clinical text from UF Health and is subsequently fine-tuned on the OpenI and MIMIC-CXR datasets. The resulting model, GatorTronT5-Radio, significantly outperforms existing baselines on both ROUGE-L and RadGraph-F1 metrics, demonstrating its effectiveness in generating high-quality radiology summaries that accurately reflect medical facts.

Technology Category

Application Category

📝 Abstract
Automatic summarization of radiology reports is an essential application to reduce the burden on physicians. Previous studies have widely used the"pre-training, fine-tuning"strategy to adapt large language models (LLMs) for summarization. This study proposed a subdomain adaptation through a mid-training method to improve summarization. We explored three adaptation strategies: (1) general-domain pre-training, (2) clinical-domain pre-training, and (3) clinical-domain pre-training followed by subdomain mid-training. We developed models using large-scale clinical text from the University of Florida (UF) Health and conducted mid-training and fine-tuning experiments using widely used benchmark datasets including OpenI and MIMIC-CXR. The experimental results show that the mid-trained model, GatorTronT5-Radio, achieved the best performance, outperforming models without mid-training in both text-based measures (ROUGE-L) and factuality measures (RadGraph-F1). Our mid-training methods also demonstrate better few-shot learning and could alleviate the"cold start"problem reported in previous studies as a learning barrier. Our findings support the use of"pre-training, mid-training, fine-tuning,"instead of the widely used direct fine-tuning strategy.
Problem

Research questions and friction points this paper is trying to address.

Automatic Summarization
Radiology Reports
Large Language Models
Subdomain Adaptation
Mid-Training
Innovation

Methods, ideas, or system contributions that make the work stand out.

mid-training
radiology report summarization
large language models
subdomain adaptation
factuality evaluation
🔎 Similar Papers
No similar papers found.
M
Mengxian Lyu
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
C
Cheng Peng
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
Z
Ziyi Chen
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
M
Mengyuan Zhang
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
J
Jieting Li Lu
Department of Engineering Education, University of Florida, Gainesville, FL, USA
Yonghui Wu
Yonghui Wu
Associate Professor, University of Florida
Natural Language ProcessingMachine LearningMedical InformaticsPharmacovigilance