🤖 AI Summary
Real-world graph data often arise from mixtures of multiple latent distributions—i.e., distinct graph generation mechanisms—yet existing graph representation learning methods (e.g., contrastive learning, Mixup) fail to explicitly model this mixture structure.
Method: This paper is the first to formally model graph data as a mixture of graphons (nonparametric graph limit models), and proposes a unified framework that estimates mixture components via motif densities (graph moments), enabling disentangled representation learning. It introduces graphon-mixture-aware Mixup augmentation and contrastive learning, theoretically justified by proving that graphons with small cut distance exhibit similar motif densities—ensuring both augmentation validity and discriminative negative sampling.
Results: Extensive experiments demonstrate that the unsupervised method MGCL achieves top-1 average rank across 8 benchmark datasets; the supervised method GMAM attains state-of-the-art accuracy on 6 out of 7 datasets.
📝 Abstract
Real-world graph datasets often consist of mixtures of populations, where graphs are generated from multiple distinct underlying distributions. However, modern representation learning approaches, such as graph contrastive learning (GCL) and augmentation methods like Mixup, typically overlook this mixture structure. In this work, we propose a unified framework that explicitly models data as a mixture of underlying probabilistic graph generative models represented by graphons. To characterize these graphons, we leverage graph moments (motif densities) to cluster graphs arising from the same model. This enables us to disentangle the mixture components and identify their distinct generative mechanisms. This model-aware partitioning benefits two key graph learning tasks: 1) It enables a graphon-mixture-aware mixup (GMAM), a data augmentation technique that interpolates in a semantically valid space guided by the estimated graphons, instead of assuming a single graphon per class. 2) For GCL, it enables model-adaptive and principled augmentations. Additionally, by introducing a new model-aware objective, our proposed approach (termed MGCL) improves negative sampling by restricting negatives to graphs from other models. We establish a key theoretical guarantee: a novel, tighter bound showing that graphs sampled from graphons with small cut distance will have similar motif densities with high probability. Extensive experiments on benchmark datasets demonstrate strong empirical performance. In unsupervised learning, MGCL achieves state-of-the-art results, obtaining the top average rank across eight datasets. In supervised learning, GMAM consistently outperforms existing strategies, achieving new state-of-the-art accuracy in 6 out of 7 datasets.