🤖 AI Summary
This paper addresses self-supervised representation learning for graph data with no or few labels, proposing the first unified framework that jointly integrates generative and contrastive paradigms. Methodologically, it introduces a community-aware joint node- and graph-level contrastive learning mechanism; employs multi-granularity graph augmentations—including feature masking and node/edge perturbations—to construct robust views; and pioneers end-to-end co-optimization of generative loss (graph reconstruction) and contrastive loss (semantic alignment). The key innovation lies in explicitly embedding community structure priors into the contrastive objective and enabling seamless joint optimization of both learning objectives. Extensive experiments on multiple benchmark datasets demonstrate consistent superiority over state-of-the-art methods: improvements of 0.23%–2.01% are achieved across node classification, clustering, and link prediction tasks.
📝 Abstract
Self-supervised learning (SSL) on graphs generates node and graph representations (i.e., embeddings) that can be used for downstream tasks such as node classification, node clustering, and link prediction. Graph SSL is particularly useful in scenarios with limited or no labeled data. Existing SSL methods predominantly follow contrastive or generative paradigms, each excelling in different tasks: contrastive methods typically perform well on classification tasks, while generative methods often excel in link prediction. In this paper, we present a novel architecture for graph SSL that integrates the strengths of both approaches. Our framework introduces community-aware node-level contrastive learning, providing more robust and effective positive and negative node pairs generation, alongside graph-level contrastive learning to capture global semantic information. Additionally, we employ a comprehensive augmentation strategy that combines feature masking, node perturbation, and edge perturbation, enabling robust and diverse representation learning. By incorporating these enhancements, our model achieves superior performance across multiple tasks, including node classification, clustering, and link prediction. Evaluations on open benchmark datasets demonstrate that our model outperforms state-of-the-art methods, achieving a performance lift of 0.23%-2.01% depending on the task and dataset.