🤖 AI Summary
Existing graph Transformers struggle to inherit the training advancements and transferability of standard Transformers due to reliance on message passing or complex attention mechanisms. This work proposes GraphFormer, a lightweight graph-adaptation framework that enables efficient transfer via three simple yet effective designs: (1) edge-aware attention based on L₂ distance, explicitly encoding geometric relationships between node pairs; (2) adaptive RMS normalization, improving training stability and generalization; and (3) shared-encoder relative positional biases, enhancing structural awareness without modifying the Transformer backbone. GraphFormer significantly boosts graph isomorphism discrimination capability while preserving architectural compatibility. It achieves state-of-the-art performance across multiple graph learning benchmarks and, theoretically, strictly surpasses the Weisfeiler–Lehman (WL) test in expressive power, as verified by formal expressivity evaluation.
📝 Abstract
Transformers have attained outstanding performance across various modalities, employing scaled-dot-product (SDP) attention mechanisms. Researchers have attempted to migrate Transformers to graph learning, but most advanced Graph Transformers are designed with major architectural differences, either integrating message-passing or incorporating sophisticated attention mechanisms. These complexities prevent the easy adoption of Transformer training advances. We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions. Specifically, we advocate for the use of (1) simplified $L_2$ attention to measure the magnitude closeness of tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder. Significant performance gains across a variety of graph datasets justify the effectiveness of our proposed modifications. Furthermore, empirical evaluation on the expressiveness benchmark reveals noteworthy realized expressiveness in the graph isomorphism.