🤖 AI Summary
Existing brain disorder prediction methods suffer from poor-quality multimodal graph construction, performance degradation with scale expansion, and difficulty in modeling complex associations between imaging and non-imaging data. To address these challenges, we propose an end-to-end multimodal graph neural network framework. First, we introduce Modality-Reward Representation Learning (MRRL), a novel task-driven approach for dynamic multimodal graph construction. Second, we design Adaptive Cross-Modal Graph Learning (ACMGL), which unifies Graph U-Net and Graph Transformer architectures to jointly and disentangledly model modality-specific and shared representations. Third, we integrate a variational autoencoder (VAE) with a multimodal alignment mechanism to enhance robustness. Evaluated on ABIDE and ADHD-200 datasets, our method achieves diagnostic accuracy improvements of 3.2–5.8% over state-of-the-art approaches. The source code is publicly available.
📝 Abstract
Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain interactions between imaging and non-imaging data to node-edge interactions within the graph, overlooking complex inter-modal correlations, leading to suboptimal outcomes. To overcome these challenges, we propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning (MMGDL) framework designed for brain disorders prediction at large scale. Specifically, to effectively leverage rich multi-modal information related to diseases, we introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system. Additionally, we employ variational autoencoder to reconstruct latent representations of non-imaging features aligned with imaging features. Based on this, we propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features through a unified GTUNet encoder taking advantages of Graph UNet and Graph Transformer, and feature fusion module. We validated our method on two public multi-modal datasets ABIDE and ADHD-200, demonstrating its superior performance in diagnosing BDs. Our code is available at https://github.com/NZWANG/MM-GTUNets.