🤖 AI Summary
To address distribution mismatch among subgraph contrastive pairs and insufficient structural fidelity in self-supervised graph representation learning (SSL-GRL) for high-dimensional graph data, this paper proposes a contrastive learning framework based on subgraph Gaussian embedding. The method explicitly models semantic and structural uncertainty of subgraphs via a novel subgraph Gaussian embedding module—the first of its kind. It jointly leverages the Wasserstein distance and Gromov–Wasserstein distance to measure subgraph similarity, thereby aligning both node feature distributions and topological structures, enhancing robustness and distribution controllability of contrastive learning. Extensive experiments on multiple benchmark graph datasets demonstrate that the proposed approach significantly outperforms or matches state-of-the-art methods, empirically validating the critical role of subgraph-level distribution modeling in SSL-GRL performance.
📝 Abstract
Graph Representation Learning (GRL) is a fundamental task in machine learning, aiming to encode high-dimensional graph-structured data into low-dimensional vectors. Self-Supervised Learning (SSL) methods are widely used in GRL because they can avoid expensive human annotation. In this work, we propose a novel Subgraph Gaussian Embedding Contrast (SubGEC) method. Our approach introduces a subgraph Gaussian embedding module, which adaptively maps subgraphs to a structured Gaussian space, ensuring the preservation of input subgraph characteristics while generating subgraphs with a controlled distribution. We then employ optimal transport distances, more precisely the Wasserstein and Gromov-Wasserstein distances, to effectively measure the similarity between subgraphs, enhancing the robustness of the contrastive learning process. Extensive experiments across multiple benchmarks demonstrate that method~outperforms or presents competitive performance against state-of-the-art approaches. Our findings provide insights into the design of SSL methods for GRL, emphasizing the importance of the distribution of the generated contrastive pairs.