BrainAnytime: Anatomy-Aware Cross-Modal Pretraining for Brain Image Analysis with Arbitrary Modality Availability

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This work addresses the challenge of heterogeneous and incomplete multimodal inputs in clinical brain imaging, where existing AI models typically require fixed modalities. To overcome this limitation, the authors propose BrainAnytime, a unified pretraining framework trained on 34,899 3D brain scans that enables flexible inference from arbitrary combinations of imaging modalities (e.g., T1, FLAIR, PET). The framework employs a shared 3D masked autoencoder (Multi-MAE3D), enhanced with relational cross-modal distillation (RCMD) to model structural–molecular correspondences between MRI and PET, and a parcellation-atlas-guided curriculum masking (PACM) strategy that prioritizes disease-vulnerable brain regions for anatomically informed representation learning. Evaluated across four downstream tasks under five modality-missing scenarios, BrainAnytime consistently outperforms specialized models, missing-modality baselines, and large-scale pretrained approaches, achieving average accuracy gains of 6.2% and 7.0% in CN vs. AD and CN vs. MCI classification, respectively.
📝 Abstract
Clinical diagnostic workups typically follow a modality escalation pathway: after initial clinical evaluation, clinicians begin with routine structural imaging (e.g., MRI), selectively add sequences such as FLAIR or T2 to refine the differential, and reserve molecular imaging (e.g., amyloid-PET) for cases that remain uncertain after standard evaluation. Consequently, patients are observed with heterogeneous and often incomplete modality subsets. However, most current AI models assume fixed data modalities as the model inputs. In this paper, we present BrainAnytime, a unified pretraining framework pretrained on 34,899 3D brain scans from five datasets that support brain image analysis under arbitrary modality availability spanning multi-sequence MRI and amyloid-PET. A single model accepts whatever imaging is available, from a lone T1 scan to a full multimodal workup. Pretraining learns structural-molecular correspondences between MRI and PET via cross-modal distillation (RCMD) and prioritizes disease-vulnerable anatomy via atlas-guided curriculum masking (PACM), all within a shared 3D masked autoencoder (Multi-MAE3D). Across four downstream tasks and five clinically motivated modality settings, BrainAnytime largely outperforms modality-specific models, missing-modality baselines, and large-scale brain MRI pretrained foundation models on most modality settings. Notably, it surpasses the strongest missing-modality baselines with relative improvements of 6.2% and 7.0% in average accuracy on CN vs. AD and CN vs. MCI classification, respectively. Code is available at https://github.com/SDH-Lab/BrainAnytime.
Problem

Research questions and friction points this paper is trying to address.

arbitrary modality availability
brain image analysis
missing modalities
multimodal learning
clinical heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal distillation
atlas-guided curriculum masking
masked autoencoder
arbitrary modality availability
multimodal brain imaging
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
G
Guangqian Yang
Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
Tong Ding
Tong Ding
PhD student in Computer Science, Harvard University
Representation LearningComputer VisionMultimodal LearningMachine Learning for Health
W
Wenlong Hou
Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
Y
Yue Xun
Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
Ye Du
Ye Du
The Hong Kong Polytechnic University
Qian Niu
Qian Niu
UT Austin
Condensed matter physics
Shujun Wang
Shujun Wang
The Hong Kong Polytechnic University
AI for HealthcareSmart AgeingAI for Science