PLGC: Pseudo-Labeled Graph Condensation

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Training graph neural networks on large-scale graphs is computationally expensive, and existing graph condensation methods rely heavily on clean labels, leading to significant performance degradation under label scarcity, noise, or distribution shifts. This work proposes a self-supervised graph condensation framework that generates latent pseudo-labels from node embeddings without requiring ground-truth labels. By jointly optimizing prototypes and node assignments, the method constructs a compact synthetic graph whose structural and feature statistics closely match those of the original graph. Theoretical analysis demonstrates that the approach effectively preserves the original graph structure and ensures embedding alignment. Experiments show that the method matches state-of-the-art supervised approaches on clean data and substantially outperforms all baselines under label noise, exhibiting remarkable robustness in both node classification and link prediction tasks.

Technology Category

Application Category

📝 Abstract
Large graph datasets make training graph neural networks (GNNs) computationally costly. Graph condensation methods address this by generating small synthetic graphs that approximate the original data. However, existing approaches rely on clean, supervised labels, which limits their reliability when labels are scarce, noisy, or inconsistent. We propose Pseudo-Labeled Graph Condensation (PLGC), a self-supervised framework that constructs latent pseudo-labels from node embeddings and optimizes condensed graphs to match the original graph's structural and feature statistics -- without requiring ground-truth labels. PLGC offers three key contributions: (1) A diagnosis of why supervised condensation fails under label noise and distribution shift. (2) A label-free condensation method that jointly learns latent prototypes and node assignments. (3) Theoretical guarantees showing that pseudo-labels preserve latent structural statistics of the original graph and ensure accurate embedding alignment. Empirically, across node classification and link prediction tasks, PLGC achieves competitive performance with state-of-the-art supervised condensation methods on clean datasets and exhibits substantial robustness under label noise, often outperforming all baselines by a significant margin. Our findings highlight the practical and theoretical advantages of self-supervised graph condensation in noisy or weakly-labeled environments.
Problem

Research questions and friction points this paper is trying to address.

graph condensation
label noise
self-supervised learning
graph neural networks
pseudo-labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Condensation
Self-supervised Learning
Pseudo-labeling
Graph Neural Networks
Label Noise Robustness