Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

๐Ÿ“… 2025-06-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing graph neural networks (GNNs) exhibit insufficient robustness under label noise, particularly failing to model complex instance-dependent noise (IDN) prevalent in real-world graph data; current studies largely assume class-dependent noise, limiting practical applicability. Method: We introduce BeGINโ€”the first IDN benchmark for graph dataโ€”and propose an LLM-driven, fine-grained noise modeling framework that enables semantic-aware label perturbation. We systematically evaluate mainstream GNNs under IDN, identify node-level parameterization as critical for noise resilience, and design a novel noise detection and robust learning framework. Contribution/Results: Experiments show that LLM-synthesized IDN reduces GNN accuracy by over 15% on average, while our method recovers up to 12.7% accuracy gain. BeGIN establishes a unified evaluation paradigm for robust graph learning, and our framework provides a practical, semantics-informed pathway toward noise-resilient GNNs.

Technology Category

Application Category

๐Ÿ“ Abstract
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in node classification tasks but struggle with label noise in real-world data. Existing studies on graph learning with label noise commonly rely on class-dependent label noise, overlooking the complexities of instance-dependent noise and falling short of capturing real-world corruption patterns. We introduce BeGIN (Benchmarking for Graphs with Instance-dependent Noise), a new benchmark that provides realistic graph datasets with various noise types and comprehensively evaluates noise-handling strategies across GNN architectures, noisy label detection, and noise-robust learning. To simulate instance-dependent corruptions, BeGIN introduces algorithmic methods and LLM-based simulations. Our experiments reveal the challenges of instance-dependent noise, particularly LLM-based corruption, and underscore the importance of node-specific parameterization to enhance GNN robustness. By comprehensively evaluating noise-handling strategies, BeGIN provides insights into their effectiveness, efficiency, and key performance factors. We expect that BeGIN will serve as a valuable resource for advancing research on label noise in graphs and fostering the development of robust GNN training methods. The code is available at https://github.com/kimsu55/BeGIN.
Problem

Research questions and friction points this paper is trying to address.

Addressing instance-dependent label noise in graph data
Evaluating noise-handling strategies for GNN robustness
Providing realistic benchmarks for graph noise research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces BeGIN benchmark for instance-dependent noise
Uses algorithmic and LLM-based noise simulation methods
Evaluates node-specific parameterization for GNN robustness
๐Ÿ”Ž Similar Papers
2024-03-07IEEE Transactions on Pattern Analysis and Machine IntelligenceCitations: 71