🤖 AI Summary
Under privacy and regulatory constraints, centralized access to multi-domain graph data is infeasible; existing Federated Graph Foundation Models (FedGFMs) suffer from inadequate global codebook modeling, failing to simultaneously ensure intra-domain semantic consistency and inter-domain knowledge diversity.
Method: We propose FedBook—the first unified federated graph foundation codebook framework for FedGFM. It performs server-side federated pretraining by aggregating client-local codebooks via a two-stage mechanism: (i) an intra-domain collaboration stage that optimizes low-frequency tokens using term-frequency-aware learning to enhance semantic coherence, and (ii) an inter-domain integration stage that weights codebook aggregation by semantic uniqueness to preserve heterogeneity. Additionally, cross-client knowledge distillation is introduced to improve generalization.
Results: Evaluated on eight cross-domain benchmarks, FedBook significantly outperforms 21 baselines—including isolated training, classical federated learning, centralized GFM federization, and state-of-the-art FedGFMs—demonstrating superior scalability, robustness, and adaptability.
📝 Abstract
Foundation models have shown remarkable cross-domain generalization in language and vision, inspiring the development of graph foundation models (GFMs). However, existing GFMs typically assume centralized access to multi-domain graphs, which is often infeasible due to privacy and institutional constraints. Federated Graph Foundation Models (FedGFMs) address this limitation, but their effectiveness fundamentally hinges on constructing a robust global codebook that achieves intra-domain coherence by consolidating mutually reinforcing semantics within each domain, while also maintaining inter-domain diversity by retaining heterogeneous knowledge across domains. To this end, we propose FedBook, a unified federated graph foundation codebook that systematically aggregates clients' local codebooks during server-side federated pre-training. FedBook follows a two-phase process: (1) Intra-domain Collaboration, where low-frequency tokens are refined by referencing more semantically reliable high-frequency tokens across clients to enhance domain-specific coherence; and (2) Inter-domain Integration, where client contributions are weighted by the semantic distinctiveness of their codebooks during the aggregation of the global GFM, thereby preserving cross-domain diversity. Extensive experiments on 8 benchmarks across multiple domains and tasks demonstrate that FedBook consistently outperforms 21 baselines, including isolated supervised learning, FL/FGL, federated adaptations of centralized GFMs, and FedGFM techniques.