Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work proposes a three-stage “pretraining–intermediate training–fine-tuning” paradigm to alleviate physician workload and enhance factual consistency and accuracy in automatic summarization of radiology reports. Following general-domain pretraining, the approach introduces an intermediate training phase tailored to the radiology subdomain, effectively mitigating cold-start issues and improving few-shot learning capabilities. Built upon the GatorTronT5 architecture, the model undergoes intermediate training on large-scale clinical text from UF Health and is subsequently fine-tuned on the OpenI and MIMIC-CXR datasets. The resulting model, GatorTronT5-Radio, significantly outperforms existing baselines on both ROUGE-L and RadGraph-F1 metrics, demonstrating its effectiveness in generating high-quality radiology summaries that accurately reflect medical facts.

Technology Category

Application Category

📝 Abstract

Automatic summarization of radiology reports is an essential application to reduce the burden on physicians. Previous studies have widely used the"pre-training, fine-tuning"strategy to adapt large language models (LLMs) for summarization. This study proposed a subdomain adaptation through a mid-training method to improve summarization. We explored three adaptation strategies: (1) general-domain pre-training, (2) clinical-domain pre-training, and (3) clinical-domain pre-training followed by subdomain mid-training. We developed models using large-scale clinical text from the University of Florida (UF) Health and conducted mid-training and fine-tuning experiments using widely used benchmark datasets including OpenI and MIMIC-CXR. The experimental results show that the mid-trained model, GatorTronT5-Radio, achieved the best performance, outperforming models without mid-training in both text-based measures (ROUGE-L) and factuality measures (RadGraph-F1). Our mid-training methods also demonstrate better few-shot learning and could alleviate the"cold start"problem reported in previous studies as a learning barrier. Our findings support the use of"pre-training, mid-training, fine-tuning,"instead of the widely used direct fine-tuning strategy.

Problem

Research questions and friction points this paper is trying to address.

Automatic Summarization

Radiology Reports

Large Language Models

Subdomain Adaptation

Mid-Training

Innovation

Methods, ideas, or system contributions that make the work stand out.

mid-training

radiology report summarization

large language models