MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of sequentially emerging downstream tasks in medical image segmentation—where balancing knowledge retention and task adaptation remains difficult—this paper proposes a sequential progressive fine-tuning framework. Methodologically, it innovatively integrates maximum data similarity-based sample selection, LoRA-based low-rank adaptation, and feature-level knowledge distillation, augmented by data distribution alignment and loss landscape analysis to ensure training stability. Unlike parallel fine-tuning (task isolation) or multi-task fine-tuning (requiring full dataset access), our framework enables incremental task integration using only current-task data. Experiments across ten 3D medical segmentation benchmarks demonstrate an average Dice score improvement of 3.0%, significantly outperforming state-of-the-art continual learning methods. Moreover, the framework exhibits superior cross-task generalization capability on unseen tasks, validating its effectiveness in preserving prior knowledge while adapting to new tasks.

Technology Category

Application Category

📝 Abstract
Foundation models have become a promising paradigm for advancing medical image analysis, particularly for segmentation tasks where downstream applications often emerge sequentially. Existing fine-tuning strategies, however, remain limited: parallel fine-tuning isolates tasks and fails to exploit shared knowledge, while multi-task fine-tuning requires simultaneous access to all datasets and struggles with incremental task integration. To address these challenges, we propose MedSeqFT, a sequential fine-tuning framework that progressively adapts pre-trained models to new tasks while refining their representational capacity. MedSeqFT introduces two core components: (1) Maximum Data Similarity (MDS) selection, which identifies downstream samples most representative of the original pre-training distribution to preserve general knowledge, and (2) Knowledge and Generalization Retention Fine-Tuning (K&G RFT), a LoRA-based knowledge distillation scheme that balances task-specific adaptation with the retention of pre-trained knowledge. Extensive experiments on two multi-task datasets covering ten 3D segmentation tasks demonstrate that MedSeqFT consistently outperforms state-of-the-art fine-tuning strategies, yielding substantial performance gains (e.g., an average Dice improvement of 3.0%). Furthermore, evaluations on two unseen tasks (COVID-19-20 and Kidney) verify that MedSeqFT enhances transferability, particularly for tumor segmentation. Visual analyses of loss landscapes and parameter variations further highlight the robustness of MedSeqFT. These results establish sequential fine-tuning as an effective, knowledge-retentive paradigm for adapting foundation models to evolving clinical tasks. Code will be released.
Problem

Research questions and friction points this paper is trying to address.

Addressing sequential fine-tuning limitations in medical image segmentation
Preserving pre-trained knowledge while adapting to new tasks
Enhancing transferability for incremental clinical segmentation applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential fine-tuning framework for incremental medical tasks
MDS selection preserves knowledge via representative samples
LoRA-based distillation balances adaptation and knowledge retention
Y
Yiwen Ye
School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
Y
Yicheng Wu
Monash University, Clayton, VIC 3168, Australia
Xiangde Luo
Xiangde Luo
Stanford University
medical image analysiscomputer visioncomputational pathology
H
He Zhang
RMIT, Melbourne, VIC 3001, Australia
Ziyang Chen
Ziyang Chen
Peking University
Quantum key distributionQuantum random number generation
Ting Dang
Ting Dang
Senior Lecturer in AI for Health, The University of Melbourne
Mobile HealthAudio ProcessingAffective ComputingTime Series ModellingWearable Sensing
Yanning Zhang
Yanning Zhang
Northwestern Polytechnical University
Computer Vision
Y
Yong Xia
School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China