Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities

📅 2025-05-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal MRI often suffers from missing modalities due to acquisition constraints or clinical limitations; however, existing pretraining methods require complete modality inputs and necessitate separate models for each modality combination, severely limiting practicality and scalability. To address this, we propose BM-MAE—a unified 3D multimodal MRI pretraining framework based on masked autoencoding—capable of plug-and-play adaptation to arbitrary modality subsets without architectural modification or subset-specific retraining, while enabling high-fidelity reconstruction of missing modalities. Its core innovation lies in a decoupled design integrating cross-modal attention with modality-specific embeddings, jointly optimizing shared representation learning and modality-specific characteristics. Evaluated on downstream tasks including brain tumor segmentation and classification, BM-MAE significantly outperforms training from scratch and matches or exceeds the performance of dedicated pretrained baselines trained independently for each modality combination.

Technology Category

Application Category

📝 Abstract
Multimodal magnetic resonance imaging (MRI) constitutes the first line of investigation for clinicians in the care of brain tumors, providing crucial insights for surgery planning, treatment monitoring, and biomarker identification. Pre-training on large datasets have been shown to help models learn transferable representations and adapt with minimal labeled data. This behavior is especially valuable in medical imaging, where annotations are often scarce. However, applying this paradigm to multimodal medical data introduces a challenge: most existing approaches assume that all imaging modalities are available during both pre-training and fine-tuning. In practice, missing modalities often occur due to acquisition issues, specialist unavailability, or specific experimental designs on small in-house datasets. Consequently, a common approach involves training a separate model for each desired modality combination, making the process both resource-intensive and impractical for clinical use. Therefore, we introduce BM-MAE, a masked image modeling pre-training strategy tailored for multimodal MRI data. The same pre-trained model seamlessly adapts to any combination of available modalities, extracting rich representations that capture both intra- and inter-modal information. This allows fine-tuning on any subset of modalities without requiring architectural changes, while still benefiting from a model pre-trained on the full set of modalities. Extensive experiments show that the proposed pre-training strategy outperforms or remains competitive with baselines that require separate pre-training for each modality subset, while substantially surpassing training from scratch on several downstream tasks. Additionally, it can quickly and efficiently reconstruct missing modalities, highlighting its practical value. Code and trained models are available at: https://github.com/Lucas-rbnt/bmmae
Problem

Research questions and friction points this paper is trying to address.

Addresses missing modalities in 3D MRI-based brain tumor analysis
Eliminates need for separate models per modality combination
Enables reconstruction of missing modalities efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

BM-MAE enables pre-training with missing MRI modalities
Same model adapts to any modality combination
Reconstructs missing modalities efficiently
🔎 Similar Papers
No similar papers found.
Lucas Robinet
Lucas Robinet
Oncopole Claudius Regaud, IRT Saint-Exupéry
Multimodal Deep LearningOncology Research
Ahmad Berjaoui
Ahmad Berjaoui
IRT Saint Exupéry
E
Elizabeth Cohen-Jonathan Moyal
Oncopole Claudius Régaud, INSERM Cancer Research Center of Toulouse, Toulouse