🤖 AI Summary
Existing surveys on graph foundation models (GFMs) suffer from outdated coverage, ambiguous taxonomies of self-supervised methods, and an overreliance on architecture-specific perspectives—hindering systematic understanding of general graph knowledge learning. To address these limitations, we propose a knowledge-dimensional, three-tiered classification framework (micro–meso–macro), encompassing nine categories of graph knowledge and over 25 pretraining tasks, unifying multi-level representations of nodes, structures, and semantics. We introduce the first knowledge-guided taxonomy for self-supervised GFMs, shifting away from traditional architecture-centric paradigms to accommodate emerging directions such as graph language models. Furthermore, we establish explicit mappings among knowledge types, pretraining tasks, and generalization strategies. This framework comprehensively covers state-of-the-art advances and significantly enhances model interpretability, downstream generalization capability, and cross-task reusability.
📝 Abstract
Graph self-supervised learning (SSL) is now a go-to method for pre-training graph foundation models (GFMs). There is a wide variety of knowledge patterns embedded in the graph data, such as node properties and clusters, which are crucial to learning generalized representations for GFMs. However, existing surveys of GFMs have several shortcomings: they lack comprehensiveness regarding the most recent progress, have unclear categorization of self-supervised methods, and take a limited architecture-based perspective that is restricted to only certain types of graph models. As the ultimate goal of GFMs is to learn generalized graph knowledge, we provide a comprehensive survey of self-supervised GFMs from a novel knowledge-based perspective. We propose a knowledge-based taxonomy, which categorizes self-supervised graph models by the specific graph knowledge utilized. Our taxonomy consists of microscopic (nodes, links, etc.), mesoscopic (context, clusters, etc.), and macroscopic knowledge (global structure, manifolds, etc.). It covers a total of 9 knowledge categories and more than 25 pretext tasks for pre-training GFMs, as well as various downstream task generalization strategies. Such a knowledge-based taxonomy allows us to re-examine graph models based on new architectures more clearly, such as graph language models, as well as provide more in-depth insights for constructing GFMs.