🤖 AI Summary
To address inherent modeling limitations of Graph Neural Networks (GNNs)—such as oversmoothing and over-squashing—this survey systematically examines Graph Transformers (GTs), covering their architectural design, theoretical foundations, and cross-domain applications. We propose the first unified taxonomy encompassing key components: graph tokenization, structural-aware attention mechanisms, positional encoding schemes, and model integration strategies. We establish a theoretical framework for characterizing GT expressivity, rigorously delineating their capacity relative to GNNs and identifying complementary strengths. Empirically, we comprehensively review GT deployments across molecular modeling, protein structure prediction, natural language processing, computer vision, traffic forecasting, neuroscience, and materials science. This work fills a critical gap in the literature by providing the first holistic, up-to-date synthesis of GT research, clarifying fundamental challenges—including scalability, structural inductive bias, and efficient training—and charting concrete directions for both theoretical advancement and practical deployment. (149 words)
📝 Abstract
Graph Transformers (GTs) have demonstrated a strong capability in modeling graph structures by addressing the intrinsic limitations of graph neural networks (GNNs), such as over-smoothing and over-squashing. Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers. In light of these rapid developments, we conduct a comprehensive review of Graph Transformers, covering aspects such as their architectures, theoretical foundations, and applications within this survey. We categorize the architecture of Graph Transformers according to their strategies for processing structural information, including graph tokenization, positional encoding, structure-aware attention and model ensemble. Furthermore, from the theoretical perspective, we examine the expressivity of Graph Transformers in various discussed architectures and contrast them with other advanced graph learning algorithms to discover the connections. Furthermore, we provide a summary of the practical applications where Graph Transformers have been utilized, such as molecule, protein, language, vision traffic, brain and material data. At the end of this survey, we will discuss the current challenges and prospective directions in Graph Transformers for potential future research.