🤖 AI Summary
Existing surveys on federated graph learning (FGL) predominantly emphasize algorithmic designs and simulation-based evaluations, lacking a systematic taxonomy grounded in data characteristics and usage patterns—thus hindering data-centric performance optimization. To address this gap, we propose the first data-oriented, two-tier classification framework: the first tier categorizes FGL settings along three orthogonal dimensions of data structure and distribution properties; the second tier organizes methodologies along three orthogonal dimensions of training workflow and technical enablers. Leveraging this framework, we systematically analyze cross-device collaborative training, privacy-preserving mechanisms, and large-model integration strategies, while covering representative real-world application scenarios. Our work not only redefines the FGL research paradigm but also uncovers fundamental modeling principles under data constraints, providing both theoretical foundations and actionable technical pathways for enhancing model performance.
📝 Abstract
In the era of big data applications, Federated Graph Learning (FGL) has emerged as a prominent solution that reconcile the tradeoff between optimizing the collective intelligence between decentralized datasets holders and preserving sensitive information to maximum. Existing FGL surveys have contributed meaningfully but largely focus on integrating Federated Learning (FL) and Graph Machine Learning (GML), resulting in early stage taxonomies that emphasis on methodology and simulated scenarios. Notably, a data centric perspective, which systematically examines FGL methods through the lens of data properties and usage, remains unadapted to reorganize FGL research, yet it is critical to assess how FGL studies manage to tackle data centric constraints to enhance model performances. This survey propose a two-level data centric taxonomy: Data Characteristics, which categorizes studies based on the structural and distributional properties of datasets used in FGL, and Data Utilization, which analyzes the training procedures and techniques employed to overcome key data centric challenges. Each taxonomy level is defined by three orthogonal criteria, each representing a distinct data centric configuration. Beyond taxonomy, this survey examines FGL integration with Pretrained Large Models, showcases realistic applications, and highlights future direction aligned with emerging trends in GML.