🤖 AI Summary
This study addresses the limited generalizability of neuroimaging models in multi-center schizophrenia diagnosis. To tackle this, we propose a multimodal deep learning framework integrating structural and functional MRI. Methodologically, we pioneer the use of latent diffusion models (LDMs) for synthetic data augmentation under cross-center and small-sample settings; employ vision transformers (ViTs) to extract modality-specific representations; design a hierarchical multimodal feature fusion mechanism; and incorporate self-supervised pretraining to enhance representation robustness. Evaluated on multiple independent multi-center datasets, our framework achieves significant improvements in classification accuracy (average +5.2%) and demonstrates strong cross-center transferability and clinical deployability. It establishes a scalable, highly generalizable paradigm for AI-driven neuroimaging diagnosis of psychiatric disorders.
📝 Abstract
Multimodal medical imaging integrates diverse data types, such as structural and functional neuroimaging, to provide complementary insights that enhance deep learning predictions and improve outcomes. This study focuses on a neuroimaging prediction framework based on both structural and functional neuroimaging data. We propose a next-generation prediction model, extbf{MultiViT2}, which combines a pretrained representative learning base model with a vision transformer backbone for prediction output. Additionally, we developed a data augmentation module based on the latent diffusion model that enriches input data by generating augmented neuroimaging samples, thereby enhancing predictive performance through reduced overfitting and improved generalizability. We show that MultiViT2 significantly outperforms the first-generation model in schizophrenia classification accuracy and demonstrates strong scalability and portability.