A Comprehensive Data-centric Overview of Federated Graph Learning

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing surveys on federated graph learning (FGL) predominantly emphasize algorithmic designs and simulation-based evaluations, lacking a systematic taxonomy grounded in data characteristics and usage patterns—thus hindering data-centric performance optimization. To address this gap, we propose the first data-oriented, two-tier classification framework: the first tier categorizes FGL settings along three orthogonal dimensions of data structure and distribution properties; the second tier organizes methodologies along three orthogonal dimensions of training workflow and technical enablers. Leveraging this framework, we systematically analyze cross-device collaborative training, privacy-preserving mechanisms, and large-model integration strategies, while covering representative real-world application scenarios. Our work not only redefines the FGL research paradigm but also uncovers fundamental modeling principles under data constraints, providing both theoretical foundations and actionable technical pathways for enhancing model performance.

Technology Category

Application Category

📝 Abstract

In the era of big data applications, Federated Graph Learning (FGL) has emerged as a prominent solution that reconcile the tradeoff between optimizing the collective intelligence between decentralized datasets holders and preserving sensitive information to maximum. Existing FGL surveys have contributed meaningfully but largely focus on integrating Federated Learning (FL) and Graph Machine Learning (GML), resulting in early stage taxonomies that emphasis on methodology and simulated scenarios. Notably, a data centric perspective, which systematically examines FGL methods through the lens of data properties and usage, remains unadapted to reorganize FGL research, yet it is critical to assess how FGL studies manage to tackle data centric constraints to enhance model performances. This survey propose a two-level data centric taxonomy: Data Characteristics, which categorizes studies based on the structural and distributional properties of datasets used in FGL, and Data Utilization, which analyzes the training procedures and techniques employed to overcome key data centric challenges. Each taxonomy level is defined by three orthogonal criteria, each representing a distinct data centric configuration. Beyond taxonomy, this survey examines FGL integration with Pretrained Large Models, showcases realistic applications, and highlights future direction aligned with emerging trends in GML.

Problem

Research questions and friction points this paper is trying to address.

Examines Federated Graph Learning from a data-centric perspective

Proposes taxonomy for data characteristics and utilization in FGL

Assesses FGL's ability to handle data constraints for performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-level data-centric taxonomy for FGL

Integrates Pretrained Large Models with FGL

Focuses on Data Characteristics and Utilization

🔎 Similar Papers

FedGraph: A Research Library and Benchmark for Federated Graph Learning