🤖 AI Summary
This work addresses the unstable or even negative transfer performance commonly observed in Graph Foundation Models (GFMs) across domains, a phenomenon whose root cause remains unclear. From a data-centric perspective, the study establishes the first theoretical framework for cross-domain GFM transfer based on graphon theory. It explicitly decomposes output shift into structural discrepancy and finite-sample effects, and introduces a domain discrepancy measure independent of relabeling. The theoretical framework is validated through graphon-based continuous modeling, Lipschitz analysis of backbone networks, stability theory of spectral positional encodings, and empirical comparisons between subspace-based and eigenvector-based positional encodings on both synthetic and real-world graph data. This provides actionable guidance for data selection and construction in GFM transfer scenarios.
📝 Abstract
Graph foundation models (GFMs) aim to reuse a single backbone across diverse graph domains, yet their transfer is often uneven and can exhibit negative transfer. While most prior work improves transfer through architectural or adaptation choices, we ask a data-centric question: which properties of two graph domains determine how much a fixed representation model changes its outputs? Using a graphon-based continuous limit for dense graphs, we show that for both set-based and message-passing tokenizations, any Lipschitz backbone admits an explicit decomposition of cross-domain output shift into (i) graph-specific finite-sample approximation terms and (ii) an intrinsic, relabeling-invariant domain discrepancy capturing structural mismatch. A key ingredient is positional-encoding (PE) stability: we establish stability guarantees for spectral PEs and highlight contrasting behaviors of eigenvector- versus subspace-based PEs. Experiments on synthetic and real graphs validate the theory and translate the decomposition into guidance for data curation in GFM transfer.