Explanation-Preserving Augmentation for Semi-Supervised Graph Representation Learning

📅 2024-10-16

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing self-supervised graph representation learning methods predominantly rely on data perturbation for augmentation, neglecting semantic consistency and thereby limiting representation quality. To address this, we propose the Explanation-Preserving Augmentation (EPA) framework—the first to integrate graph explanation techniques (e.g., GNNExplainer variants) into graph augmentation design. EPA trains a lightweight explainer on a small set of labeled nodes to identify semantically critical substructures, then generates semantically consistent augmented graphs accordingly. Coupled with subgraph sampling, feature masking, and contrastive learning, EPA enables semantic-aware representation learning within semi-supervised GNNs. Theoretical analysis establishes its robustness to structural noise, while extensive experiments demonstrate that EPA consistently outperforms state-of-the-art semantic-agnostic methods across multiple benchmarks, significantly improving both semantic fidelity and generalization performance.

Technology Category

Application Category

📝 Abstract

Graph representation learning (GRL), enhanced by graph augmentation methods, has emerged as an effective technique achieving performance improvements in wide tasks such as node classification and graph classification. In self-supervised GRL, paired graph augmentations are generated from each graph. Its objective is to infer similar representations for augmentations of the same graph, but maximally distinguishable representations for augmentations of different graphs. Analogous to image and language domains, the desiderata of an ideal augmentation method include both (1) semantics-preservation; and (2) data-perturbation; i.e., an augmented graph should preserve the semantics of its original graph while carrying sufficient variance. However, most existing (un-)/self-supervised GRL methods focus on data perturbation but largely neglect semantics preservation. To address this challenge, in this paper, we propose a novel method, Explanation-Preserving Augmentation (EPA), that leverages graph explanation techniques for generating augmented graphs that can bridge the gap between semantics-preservation and data-perturbation. EPA first uses a small number of labels to train a graph explainer to infer the sub-structures (explanations) that are most relevant to a graph's semantics. These explanations are then used to generate semantics-preserving augmentations for self-supervised GRL, namely EPA-GRL. We demonstrate theoretically, using an analytical example, and through extensive experiments on a variety of benchmark datasets that EPA-GRL outperforms the state-of-the-art (SOTA) GRL methods, which are built upon semantics-agnostic data augmentations.

Problem

Research questions and friction points this paper is trying to address.

Enhancing graph augmentations by preserving semantic meaning in self-supervised learning

Addressing suboptimal solutions from data-perturbation only methods in graph representation

Leveraging graph explanations to generate semantics-preserving augmentations semi-supervisedly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explanation-preserving augmentation for graph representation learning

Leveraging graph explanations to ensure semantics-preservation

Semi-supervised framework combining explainer training with augmentation

🔎 Similar Papers

No similar papers found.