Data-centric Federated Graph Learning with Large Language Models

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address slow convergence and poor performance in federated graph learning (FGL) caused by non-IID client data, this paper proposes, for the first time, a **data-level heterogeneity mitigation paradigm**. Without fine-tuning large language models (LLMs), it leverages their zero-shot generation and reflective reasoning capabilities to synthesize missing neighbors and infer latent edges on local subgraphs. These are integrated with a pre-trained edge predictor and text-based attribute augmentation to construct high-quality pseudo-graph structures. The method employs a parameter-free federated generation–reflection mechanism, enabling plug-and-play integration into existing FGL frameworks. Theoretically, it decouples the LLM’s dual roles—generation and verification—in FGL. Extensive experiments on three real-world graph datasets demonstrate significant improvements in global model convergence speed and generalization performance, empirically validating the efficacy of data augmentation for mitigating non-IID graph distribution shifts.

Technology Category

Application Category

📝 Abstract
In federated graph learning (FGL), a complete graph is divided into multiple subgraphs stored in each client due to privacy concerns, and all clients jointly train a global graph model by only transmitting model parameters. A pain point of FGL is the heterogeneity problem, where nodes or structures present non-IID properties among clients (e.g., different node label distributions), dramatically undermining the convergence and performance of FGL. To address this, existing efforts focus on design strategies at the model level, i.e., they design models to extract common knowledge to mitigate heterogeneity. However, these model-level strategies fail to fundamentally address the heterogeneity problem as the model needs to be designed from scratch when transferring to other tasks. Motivated by large language models (LLMs) having achieved remarkable success, we aim to utilize LLMs to fully understand and augment local text-attributed graphs, to address data heterogeneity at the data level. In this paper, we propose a general framework LLM4FGL that innovatively decomposes the task of LLM for FGL into two sub-tasks theoretically. Specifically, for each client, it first utilizes the LLM to generate missing neighbors and then infers connections between generated nodes and raw nodes. To improve the quality of generated nodes, we design a novel federated generation-and-reflection mechanism for LLMs, without the need to modify the parameters of the LLM but relying solely on the collective feedback from all clients. After neighbor generation, all the clients utilize a pre-trained edge predictor to infer the missing edges. Furthermore, our framework can seamlessly integrate as a plug-in with existing FGL methods. Experiments on three real-world datasets demonstrate the superiority of our method compared to advanced baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses data heterogeneity in federated graph learning
Utilizes LLMs to augment local text-attributed graphs
Proposes a federated generation-and-reflection mechanism for LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-augmented local text-attributed graphs
Federated generation-and-reflection mechanism
Pre-trained edge predictor integration
🔎 Similar Papers
No similar papers found.