From Moments to Models: Graphon Mixture-Aware Mixup and Contrastive Learning

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Real-world graph data often arise from mixtures of multiple latent distributions—i.e., distinct graph generation mechanisms—yet existing graph representation learning methods (e.g., contrastive learning, Mixup) fail to explicitly model this mixture structure. Method: This paper is the first to formally model graph data as a mixture of graphons (nonparametric graph limit models), and proposes a unified framework that estimates mixture components via motif densities (graph moments), enabling disentangled representation learning. It introduces graphon-mixture-aware Mixup augmentation and contrastive learning, theoretically justified by proving that graphons with small cut distance exhibit similar motif densities—ensuring both augmentation validity and discriminative negative sampling. Results: Extensive experiments demonstrate that the unsupervised method MGCL achieves top-1 average rank across 8 benchmark datasets; the supervised method GMAM attains state-of-the-art accuracy on 6 out of 7 datasets.

Technology Category

Application Category

📝 Abstract

Real-world graph datasets often consist of mixtures of populations, where graphs are generated from multiple distinct underlying distributions. However, modern representation learning approaches, such as graph contrastive learning (GCL) and augmentation methods like Mixup, typically overlook this mixture structure. In this work, we propose a unified framework that explicitly models data as a mixture of underlying probabilistic graph generative models represented by graphons. To characterize these graphons, we leverage graph moments (motif densities) to cluster graphs arising from the same model. This enables us to disentangle the mixture components and identify their distinct generative mechanisms. This model-aware partitioning benefits two key graph learning tasks: 1) It enables a graphon-mixture-aware mixup (GMAM), a data augmentation technique that interpolates in a semantically valid space guided by the estimated graphons, instead of assuming a single graphon per class. 2) For GCL, it enables model-adaptive and principled augmentations. Additionally, by introducing a new model-aware objective, our proposed approach (termed MGCL) improves negative sampling by restricting negatives to graphs from other models. We establish a key theoretical guarantee: a novel, tighter bound showing that graphs sampled from graphons with small cut distance will have similar motif densities with high probability. Extensive experiments on benchmark datasets demonstrate strong empirical performance. In unsupervised learning, MGCL achieves state-of-the-art results, obtaining the top average rank across eight datasets. In supervised learning, GMAM consistently outperforms existing strategies, achieving new state-of-the-art accuracy in 6 out of 7 datasets.

Problem

Research questions and friction points this paper is trying to address.

Modeling graph datasets as mixtures of distinct generative distributions

Using graph moments to cluster graphs and identify mixture components

Improving graph learning through model-aware mixup and contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graphon mixture-aware mixup for semantic data augmentation

Model-adaptive contrastive learning with improved negative sampling

Graph moment clustering to disentangle mixture components

🔎 Similar Papers

Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification