Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited clinical applicability in cancer radiotherapy decision-making due to insufficient domain-specific adaptation. Method: This study conducts the first systematic validation of domain-adaptive fine-tuning for open-source LLMs—LLaMA2-7B and Mistral-7B—using LoRA on 7,903 structured oncology cases. Supervised fine-tuning targets three critical radiotherapy tasks: treatment plan generation, modality selection (photon/proton/electron/brachytherapy), and ICD-10 coding prediction. Contribution/Results: All tasks demonstrate statistically significant performance improvements over baselines (p ≤ 0.001). Over 60% of generated treatment plans were clinically endorsed by board-certified radiation oncologists. Precision, recall, and F1 scores improved substantially for modality selection and coding prediction. A physician-led clinical acceptability assessment further confirmed model reliability. This work establishes a methodological framework and empirical foundation for deploying trustworthy, clinically aligned LLMs in precision radiotherapy.

Technology Category

Application Category

📝 Abstract
Background: The radiation oncology clinical practice involves many steps relying on the dynamic interplay of abundant text data. Large language models have displayed remarkable capabilities in processing complex text information. But their direct applications in specific fields like radiation oncology remain underexplored. Purpose: This study aims to investigate whether fine-tuning LLMs with domain knowledge can improve the performance on Task (1) treatment regimen generation, Task (2) treatment modality selection (photon, proton, electron, or brachytherapy), and Task (3) ICD-10 code prediction in radiation oncology. Methods: Data for 15,724 patient cases were extracted. Cases where patients had a single diagnostic record, and a clearly identifiable primary treatment plan were selected for preprocessing and manual annotation to have 7,903 cases of the patient diagnosis, treatment plan, treatment modality, and ICD-10 code. Each case was used to construct a pair consisting of patient diagnostics details and an answer (treatment regimen, treatment modality, or ICD-10 code respectively) for the supervised fine-tuning of these three tasks. Open source LLaMA2-7B and Mistral-7B models were utilized for the fine-tuning with the Low-Rank Approximations method. Accuracy and ROUGE-1 score were reported for the fine-tuned models and original models. Clinical evaluation was performed on Task (1) by radiation oncologists, while precision, recall, and F-1 score were evaluated for Task (2) and (3). One-sided Wilcoxon signed-rank tests were used to statistically analyze the results. Results: Fine-tuned LLMs outperformed original LLMs across all tasks with p-value<= 0.001. Clinical evaluation demonstrated that over 60% of the fine-tuned LLMs-generated treatment regimens were clinically acceptable. Precision, recall, and F1-score showed improved performance of fine-tuned LLMs.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Cancer Radiotherapy
Treatment Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Cancer Radiotherapy
Performance Enhancement
🔎 Similar Papers
No similar papers found.
Peilong Wang
Peilong Wang
City of Hope
PhysicsAIImaging
Zhengliang Liu
Zhengliang Liu
University of Georgia
Natural Language ProcessingMedical NLPMedical Image AnalysisData Visualization
Y
Yiwei Li
School of Computing, University of Georgia, Athens, GA 30602, USA
J
J. Holmes
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ 85054, USA
P
Peng Shu
School of Computing, University of Georgia, Athens, GA 30602, USA
Lian Zhang
Lian Zhang
Student of Electrical Engineering and Computer Science, Vanderbilt University
Intelligent Human Machine SystemsMachine LearningArtificial IntelligenceAffective ComputingHuman-Computer Interactions
X
Xiang Li
Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
Quanzheng Li
Quanzheng Li
Massachusetts General Hospital, Harvard Medical School
Image ReconstructionMedical Image AnalysisDeep Learning in MedicineMultimodality Medical Data Analysis
B
Brady S. Laughlin
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ 85054, USA
D
Diego Santos Toesca
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ 85054, USA
S
S. Vora
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ 85054, USA
S
Samir H. Patel
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ 85054, USA
T
Terence T. Sio
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ 85054, USA
Tianming Liu
Tianming Liu
Distinguished Research Professor of Computer Science, University of Georgia
BrainBrain-Inspired AILLMArtificial General IntelligenceQuantum AI
W
Wei Liu
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ 85054, USA