MultiViT2: A Data-augmented Multimodal Neuroimaging Prediction Framework via Latent Diffusion Model

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited generalizability of neuroimaging models in multi-center schizophrenia diagnosis. To tackle this, we propose a multimodal deep learning framework integrating structural and functional MRI. Methodologically, we pioneer the use of latent diffusion models (LDMs) for synthetic data augmentation under cross-center and small-sample settings; employ vision transformers (ViTs) to extract modality-specific representations; design a hierarchical multimodal feature fusion mechanism; and incorporate self-supervised pretraining to enhance representation robustness. Evaluated on multiple independent multi-center datasets, our framework achieves significant improvements in classification accuracy (average +5.2%) and demonstrates strong cross-center transferability and clinical deployability. It establishes a scalable, highly generalizable paradigm for AI-driven neuroimaging diagnosis of psychiatric disorders.

Technology Category

Application Category

📝 Abstract
Multimodal medical imaging integrates diverse data types, such as structural and functional neuroimaging, to provide complementary insights that enhance deep learning predictions and improve outcomes. This study focuses on a neuroimaging prediction framework based on both structural and functional neuroimaging data. We propose a next-generation prediction model, extbf{MultiViT2}, which combines a pretrained representative learning base model with a vision transformer backbone for prediction output. Additionally, we developed a data augmentation module based on the latent diffusion model that enriches input data by generating augmented neuroimaging samples, thereby enhancing predictive performance through reduced overfitting and improved generalizability. We show that MultiViT2 significantly outperforms the first-generation model in schizophrenia classification accuracy and demonstrates strong scalability and portability.
Problem

Research questions and friction points this paper is trying to address.

Develops a multimodal neuroimaging prediction framework for enhanced accuracy
Integrates structural and functional data to improve deep learning predictions
Uses latent diffusion model for data augmentation to reduce overfitting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines pretrained model with vision transformer
Uses latent diffusion for data augmentation
Enhances prediction accuracy and generalizability
🔎 Similar Papers
No similar papers found.
B
Bi Yuda
Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS)
J
Jia Sihan
Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS)
Yutong Gao
Yutong Gao
Nanjing University of Science and Technology
computer visionNLPAIGC
A
Abrol Anees
Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS)
Zening Fu
Zening Fu
Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS)
C
Calhoun Vince
Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS)