🤖 AI Summary
This work addresses the challenge of heterogeneous and incomplete multimodal inputs in clinical brain imaging, where existing AI models typically require fixed modalities. To overcome this limitation, the authors propose BrainAnytime, a unified pretraining framework trained on 34,899 3D brain scans that enables flexible inference from arbitrary combinations of imaging modalities (e.g., T1, FLAIR, PET). The framework employs a shared 3D masked autoencoder (Multi-MAE3D), enhanced with relational cross-modal distillation (RCMD) to model structural–molecular correspondences between MRI and PET, and a parcellation-atlas-guided curriculum masking (PACM) strategy that prioritizes disease-vulnerable brain regions for anatomically informed representation learning. Evaluated across four downstream tasks under five modality-missing scenarios, BrainAnytime consistently outperforms specialized models, missing-modality baselines, and large-scale pretrained approaches, achieving average accuracy gains of 6.2% and 7.0% in CN vs. AD and CN vs. MCI classification, respectively.
📝 Abstract
Clinical diagnostic workups typically follow a modality escalation pathway: after initial clinical evaluation, clinicians begin with routine structural imaging (e.g., MRI), selectively add sequences such as FLAIR or T2 to refine the differential, and reserve molecular imaging (e.g., amyloid-PET) for cases that remain uncertain after standard evaluation. Consequently, patients are observed with heterogeneous and often incomplete modality subsets. However, most current AI models assume fixed data modalities as the model inputs. In this paper, we present BrainAnytime, a unified pretraining framework pretrained on 34,899 3D brain scans from five datasets that support brain image analysis under arbitrary modality availability spanning multi-sequence MRI and amyloid-PET. A single model accepts whatever imaging is available, from a lone T1 scan to a full multimodal workup. Pretraining learns structural-molecular correspondences between MRI and PET via cross-modal distillation (RCMD) and prioritizes disease-vulnerable anatomy via atlas-guided curriculum masking (PACM), all within a shared 3D masked autoencoder (Multi-MAE3D). Across four downstream tasks and five clinically motivated modality settings, BrainAnytime largely outperforms modality-specific models, missing-modality baselines, and large-scale brain MRI pretrained foundation models on most modality settings. Notably, it surpasses the strongest missing-modality baselines with relative improvements of 6.2% and 7.0% in average accuracy on CN vs. AD and CN vs. MCI classification, respectively. Code is available at https://github.com/SDH-Lab/BrainAnytime.