Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition

📅 2024-05-22

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing graph compression (GC) methods rely on bi-level optimization and gradient-based iterations, incurring high computational overhead and slow training. To address this, we propose CGC—a training-free GC framework. Our core insight is to formulate GC as a distribution matching and class partitioning problem between classes and nodes: we introduce the novel “class-to-node” feature matching paradigm, derive a closed-form solution for node features from a predefined graph structure, and directly solve class partitioning via any clustering algorithm—eliminating gradient optimization entirely. On Ogbn-products, CGC compresses the graph in just 30 seconds, achieving 10²–10⁴× speedup over state-of-the-art methods, while boosting downstream GNN accuracy by up to 4.2%. CGC is the first scalable, theoretically tractable, and training-free framework for large-scale graph compression.

Technology Category

Application Category

📝 Abstract

The increasing prevalence of large-scale graphs poses a significant challenge for graph neural network training, attributed to their substantial computational requirements. In response, graph condensation (GC) emerges as a promising data-centric solution aiming to substitute the large graph with a small yet informative condensed graph to facilitate data-efficient GNN training. However, existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources and training time. In this paper, we revisit existing GC optimization strategies and identify two pervasive issues therein: (1) various GC optimization strategies converge to coarse-grained class-level node feature matching between the original and condensed graphs; (2) existing GC methods rely on a Siamese graph network architecture that requires time-consuming bi-level optimization with iterative gradient computations. To overcome these issues, we propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC), which refines the node distribution matching from the class-to-class paradigm into a novel class-to-node paradigm, transforming the GC optimization into a class partition problem which can be efficiently solved by any clustering methods. Moreover, CGC incorporates a pre-defined graph structure to enable a closed-form solution for condensed node features, eliminating the need for back-and-forth gradient descent in existing GC approaches. Extensive experiments demonstrate that CGC achieves an exceedingly efficient condensation process with advanced accuracy. Compared with the state-of-the-art GC methods, CGC condenses the Ogbn-products graph within 30 seconds, achieving a speedup ranging from $10^2$X to $10^4$X and increasing accuracy by up to 4.2%.

Problem

Research questions and friction points this paper is trying to address.

Graph Compression

Computational Efficiency

Feature Matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised Graph Compression

Node Feature Matching

Pre-designed Graph Structure

🔎 Similar Papers

Disentangled Condensation for Large-scale Graphs