MultiViT2: A Data-augmented Multimodal Neuroimaging Prediction Framework via Latent Diffusion Model

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study addresses the limited generalizability of neuroimaging models in multi-center schizophrenia diagnosis. To tackle this, we propose a multimodal deep learning framework integrating structural and functional MRI. Methodologically, we pioneer the use of latent diffusion models (LDMs) for synthetic data augmentation under cross-center and small-sample settings; employ vision transformers (ViTs) to extract modality-specific representations; design a hierarchical multimodal feature fusion mechanism; and incorporate self-supervised pretraining to enhance representation robustness. Evaluated on multiple independent multi-center datasets, our framework achieves significant improvements in classification accuracy (average +5.2%) and demonstrates strong cross-center transferability and clinical deployability. It establishes a scalable, highly generalizable paradigm for AI-driven neuroimaging diagnosis of psychiatric disorders.

Technology Category

Application Category

📝 Abstract

Multimodal medical imaging integrates diverse data types, such as structural and functional neuroimaging, to provide complementary insights that enhance deep learning predictions and improve outcomes. This study focuses on a neuroimaging prediction framework based on both structural and functional neuroimaging data. We propose a next-generation prediction model, extbf{MultiViT2}, which combines a pretrained representative learning base model with a vision transformer backbone for prediction output. Additionally, we developed a data augmentation module based on the latent diffusion model that enriches input data by generating augmented neuroimaging samples, thereby enhancing predictive performance through reduced overfitting and improved generalizability. We show that MultiViT2 significantly outperforms the first-generation model in schizophrenia classification accuracy and demonstrates strong scalability and portability.

Problem

Research questions and friction points this paper is trying to address.

Develops a multimodal neuroimaging prediction framework for enhanced accuracy

Integrates structural and functional data to improve deep learning predictions

Uses latent diffusion model for data augmentation to reduce overfitting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines pretrained model with vision transformer

Uses latent diffusion for data augmentation

Enhances prediction accuracy and generalizability

🔎 Similar Papers

No similar papers found.