Efficient Parameter Adaptation for Multi-Modal Medical Image Segmentation and Prognosis

📅 2025-04-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalizability of multimodal tumor segmentation and prognosis models caused by PET data scarcity, this paper proposes a cross-modal low-entanglement parameter-efficient adaptation framework. Built upon UNETR/Swin UNETR architectures, it innovatively decouples LoRA/DoRA for fine-tuning Transformer attention weights, enabling lightweight, catastrophic-forgetting-free adaptation to PET (and EHR) modalities using only CT pretraining. We further introduce cross-modal decoupled regularization and a lightweight cross-modal fine-tuning mechanism. Experiments demonstrate a 28% improvement in PET segmentation Dice score and prognosis C-index gains of 10% (CT → CT+PET) and 23% (CT → CT+PET+EHR), achieving performance comparable to full-parameter early fusion with only 8% trainable parameters.

Technology Category

Application Category

📝 Abstract
Cancer detection and prognosis relies heavily on medical imaging, particularly CT and PET scans. Deep Neural Networks (DNNs) have shown promise in tumor segmentation by fusing information from these modalities. However, a critical bottleneck exists: the dependency on CT-PET data concurrently for training and inference, posing a challenge due to the limited availability of PET scans. Hence, there is a clear need for a flexible and efficient framework that can be trained with the widely available CT scans and can be still adapted for PET scans when they become available. In this work, we propose a parameter-efficient multi-modal adaptation (PEMMA) framework for lightweight upgrading of a transformer-based segmentation model trained only on CT scans such that it can be efficiently adapted for use with PET scans when they become available. This framework is further extended to perform prognosis task maintaining the same efficient cross-modal fine-tuning approach. The proposed approach is tested with two well-known segementation backbones, namely UNETR and Swin UNETR. Our approach offers two main advantages. Firstly, we leverage the inherent modularity of the transformer architecture and perform low-rank adaptation (LoRA) as well as decomposed low-rank adaptation (DoRA) of the attention weights to achieve parameter-efficient adaptation. Secondly, by minimizing cross-modal entanglement, PEMMA allows updates using only one modality without causing catastrophic forgetting in the other. Our method achieves comparable performance to early fusion, but with only 8% of the trainable parameters, and demonstrates a significant +28% Dice score improvement on PET scans when trained with a single modality. Furthermore, in prognosis, our method improves the concordance index by +10% when adapting a CT-pretrained model to include PET scans, and by +23% when adapting for both PET and EHR data.
Problem

Research questions and friction points this paper is trying to address.

Enables CT-trained models to adapt efficiently to PET scans
Reduces dependency on concurrent CT-PET data for training
Improves prognosis accuracy with parameter-efficient cross-modal adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient multi-modal adaptation (PEMMA) framework
Low-rank and decomposed low-rank adaptation (LoRA/DoRA)
Minimizes cross-modal entanglement for single-modality updates
🔎 Similar Papers
No similar papers found.