🤖 AI Summary
This work addresses the challenges of knowledge loss due to discrete quantization and the difficulty of achieving semantic-structural alignment under data heterogeneity and communication constraints in federated graph foundation models. To this end, the authors propose a quantization-free continuous embedding space alignment strategy that enforces semantic-structural consistency between a frozen pre-trained language model and a graph neural network. The approach leverages unsupervised contrastive learning to capture transferable generalizable knowledge and incorporates a lightweight prompt-tuning mechanism to adapt to downstream tasks without full-parameter fine-tuning. This design significantly enhances model generalization and communication efficiency, consistently outperforming existing baselines across multiple datasets, with performance gains reaching up to 14.37%.
📝 Abstract
Recent studies of federated graph foundational models (FedGFMs) break the idealized and untenable assumption of having centralized data storage to train graph foundation models, and accommodate the reality of distributed, privacy-restricted data silos. Despite their simplicity and intuition, existing studies that project aligned generalizable knowledge onto a discrete token space via vector-quantized backbones suffer from irreversible knowledge loss during the quantization process. In this context, we argue that reconciling the semantic-structural orthogonality and integrity between pre-trained language models (PLMs) and graph neural networks (GNNs) is paramount for developing effective FedGFMs while simultaneously mitigating the severe data heterogeneity and communication constraints inherent in distributed, resource-limited environments. To address these issues, we propose FedGALA (Federated Graph And Language Alignment), a framework that resolves graph-based semantic-structural orthogonality and integrity in federated settings by employing unsupervised contrastive learning to align GNNs and frozen PLMs within a continuous embedding space, thereby capturing robust, transferable general knowledge. Subsequently, FedGALA leverages a communication-efficient prompt tuning mechanism to steer these pre-aligned encoders and frozen PLMs, facilitating effective adaptation to diverse downstream tasks while circumventing the prohibitive overhead of full-parameter fine-tuning. The comprehensive experiments validate that FedGALA outperforms all competitive baselines across multi-domain datasets on multiple tasks with up to 14.37% performance improvement.