🤖 AI Summary
Existing brain graph pretraining methods rely on random masking or dropout strategies that disrupt semantic connectivity patterns, and their graph-level readout and reconstruction mechanisms struggle to capture global topological structure, limiting representation robustness. This work proposes the first diffusion-based framework for brain graph pretraining, introducing a unified diffusion-guided approach that leverages the diffusion process to inform structure-aware masking and dropout. Furthermore, it designs a topology-aware graph-level readout combined with a node-level global reconstruction mechanism, effectively preserving semantic connections while modeling global architecture. Evaluated on a large-scale, multi-site dataset encompassing over 25,000 subjects and 60,000 scans, the method consistently outperforms existing approaches across multiple psychiatric disorder classification tasks and under diverse brain parcellation schemes.
📝 Abstract
With the growing interest in foundation models for brain signals, graph-based pretraining has emerged as a promising paradigm for learning transferable representations from connectome data. However, existing contrastive and masked autoencoder methods typically rely on naive random dropping or masking for augmentation, which is ill-suited for brain graphs and hypergraphs as it disrupts semantically meaningful connectivity patterns. Moreover, commonly used graph-level readout and reconstruction schemes fail to capture global structural information, limiting the robustness of learned representations. In this work, we propose a unified diffusion-based pretraining framework that addresses both limitations. First, diffusion is designed to guide structure-aware dropping and masking strategies, preserving brain graph semantics while maintaining effective pretraining diversity. Second, diffusion enables topology-aware graph-level readout and node-level global reconstruction by allowing graph embeddings and masked nodes to aggregate information from globally related regions. Extensive experiments across multiple neuroimaging datasets with over 25,000 subjects and 60,000 scans involving various mental disorders and brain atlases demonstrate consistent performance improvements.