Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

📅 2024-12-23

🏛️ Trans. Mach. Learn. Res.

📈 Citations: 1

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Graph neural network (GNN) training on multi-label graph data—such as social and biological networks—is computationally inefficient and resource-intensive; moreover, existing graph compression methods are restricted to single-label settings. Method: This paper pioneers the extension of graph compression to the multi-label regime. We propose a multi-label-aware synthetic graph initialization strategy based on K-Center clustering and an optimization objective using binary cross-entropy loss, thereby relaxing the restrictive single-label assumption. Contribution/Results: We systematically establish the first benchmark for multi-label graph compression. Extensive experiments across eight real-world multi-label datasets demonstrate that our method—GCond augmented with K-Center initialization and binary cross-entropy loss—significantly improves both GNN training efficiency and generalization performance. Our approach establishes a scalable, high-fidelity compression paradigm for large-scale multi-label graph learning.

Technology Category

Application Category

📝 Abstract

As graph data grows increasingly complicate, training graph neural networks (GNNs) on large-scale datasets presents significant challenges, including computational resource constraints, data redundancy, and transmission inefficiencies. While existing graph condensation techniques have shown promise in addressing these issues, they are predominantly designed for single-label datasets, where each node is associated with a single class label. However, many real-world applications, such as social network analysis and bioinformatics, involve multi-label graph datasets, where one node can have various related labels. To deal with this problem, we extends traditional graph condensation approaches to accommodate multi-label datasets by introducing modifications to synthetic dataset initialization and condensing optimization. Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), achieves best performance in general. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data, but also offering substantial benefits for diverse real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Extends graph condensation to multi-label datasets

Addresses computational and efficiency challenges in GNN training

Enhances scalability for real-world applications like social networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends graph condensation to multi-label datasets

Uses K-Center initialization and BCELoss optimization

Enhances GNN scalability for real-world applications

🔎 Similar Papers

Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition