🤖 AI Summary
Structural connectomes exhibit both continuous variation (e.g., connection strength) and discrete variation (e.g., scanning site), which conventional dimensionality reduction methods—such as PCA and standard VAEs—fail to model jointly. To address this, we propose Mix-VAE, a variational autoencoder with a hybrid latent space that simultaneously learns continuous latent variables (capturing gradients of structural connectivity strength) and discrete latent variables (encoding categorical confounds like acquisition site) in a fully unsupervised framework. This marks the first approach to achieve disentangled representation of multi-source variation in structural connectomes. Evaluated on a large-scale dataset of 5,761 subjects, Mix-VAE achieves an adjusted Rand index of 0.65 for site identification using discrete latents—significantly outperforming PCA and standard VAEs. The method establishes a novel, interpretable, and factorized paradigm for dissecting heterogeneity in brain connectivity.
📝 Abstract
Structural connectomes are detailed graphs that map how different brain regions are physically connected, offering critical insight into aging, cognition, and neurodegenerative diseases. However, these connectomes are high-dimensional and densely interconnected, which makes them difficult to interpret and analyze at scale. While low-dimensional spaces like PCA and autoencoders are often used to capture major sources of variation, their latent spaces are generally continuous and cannot fully reflect the mixed nature of variability in connectomes, which include both continuous (e.g., connectivity strength) and discrete factors (e.g., imaging site). Motivated by this, we propose a variational autoencoder (VAE) with a hybrid latent space that jointly models the discrete and continuous components. We analyze a large dataset of 5,761 connectomes from six Alzheimer's disease studies with ten acquisition protocols. Each connectome represents a single scan from a unique subject (3579 females, 2182 males), aged 22 to 102, with 4338 cognitively normal, 809 with mild cognitive impairment (MCI), and 614 with Alzheimer's disease (AD). Each connectome contains 121 brain regions defined by the BrainCOLOR atlas. We train our hybrid VAE in an unsupervised way and characterize what each latent component captures. We find that the discrete space is particularly effective at capturing subtle site-related differences, achieving an Adjusted Rand Index (ARI) of 0.65 with site labels, significantly outperforming PCA and a standard VAE followed by clustering (p<0.05). These results demonstrate that the hybrid latent space can disentangle distinct sources of variability in connectomes in an unsupervised manner, offering potential for large-scale connectome analysis.