MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Foundational models in medical imaging often suffer from poor generalization across multimodal, multitask, and cross-domain scenarios due to data scarcity. To address this, we propose a lightweight modular adaptation framework that enables on-the-fly activation of task-specific components—such as classification, prognosis prediction, and segmentation—during inference, supporting multimodal inputs (e.g., PET/CT) and enabling cross-task collaborative reasoning. Unlike conventional static adaptation paradigms, our framework endows pretrained models with dynamic evolvability, significantly enhancing both generalization capability and deployment efficiency. We validate the framework by extending a chest CT foundation model to jointly support prognosis prediction and segmentation: it achieves a 5% absolute improvement in Dice score over the baseline. Results demonstrate strong efficacy and scalability in multitask, multimodal settings, establishing a flexible pathway for adaptive clinical AI deployment.

Technology Category

Application Category

📝 Abstract
Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Instead of building separate models, we propose MAFM^3 (Modular Adaptation of Foundation Models for Multi-Modal Medical AI), a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities through lightweight modular components. These components serve as specialized skill sets that allow the system to flexibly activate the appropriate capability at the inference time, depending on the input type or clinical objective. Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM^3 provides a unified and expandable framework for efficient multitask and multimodality adaptation. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification into prognosis and segmentation modules. Our results show improved performance on both tasks. Furthermore, by incorporating PET scans, MAFM^3 achieved an improvement in the Dice score 5% compared to the respective baselines. These findings establish that foundation models, when equipped with modular components, are not inherently constrained to their initial training scope but can evolve into multitask, multimodality systems for medical imaging. The code implementation of this work can be found at https://github.com/Areeb2735/CTscan_prognosis_VLM
Problem

Research questions and friction points this paper is trying to address.

Addresses data scarcity in medical imaging for foundation model training
Enables single foundation model adaptation across domains and modalities
Provides modular framework for multitask medical imaging without separate models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular adaptation expands foundation models into multiple domains
Lightweight components enable flexible activation for different modalities
Unified framework supports multitask and multimodality medical imaging
🔎 Similar Papers
No similar papers found.