Simplifying Graph Transformers

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph Transformers struggle to inherit the training advancements and transferability of standard Transformers due to reliance on message passing or complex attention mechanisms. This work proposes GraphFormer, a lightweight graph-adaptation framework that enables efficient transfer via three simple yet effective designs: (1) edge-aware attention based on L₂ distance, explicitly encoding geometric relationships between node pairs; (2) adaptive RMS normalization, improving training stability and generalization; and (3) shared-encoder relative positional biases, enhancing structural awareness without modifying the Transformer backbone. GraphFormer significantly boosts graph isomorphism discrimination capability while preserving architectural compatibility. It achieves state-of-the-art performance across multiple graph learning benchmarks and, theoretically, strictly surpasses the Weisfeiler–Lehman (WL) test in expressive power, as verified by formal expressivity evaluation.

Technology Category

Application Category

📝 Abstract
Transformers have attained outstanding performance across various modalities, employing scaled-dot-product (SDP) attention mechanisms. Researchers have attempted to migrate Transformers to graph learning, but most advanced Graph Transformers are designed with major architectural differences, either integrating message-passing or incorporating sophisticated attention mechanisms. These complexities prevent the easy adoption of Transformer training advances. We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions. Specifically, we advocate for the use of (1) simplified $L_2$ attention to measure the magnitude closeness of tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder. Significant performance gains across a variety of graph datasets justify the effectiveness of our proposed modifications. Furthermore, empirical evaluation on the expressiveness benchmark reveals noteworthy realized expressiveness in the graph isomorphism.
Problem

Research questions and friction points this paper is trying to address.

Simplifying Graph Transformers for easier adoption
Enhancing graph learning with plain Transformer modifications
Improving performance across diverse graph datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simplified L2 attention for token closeness
Adaptive RMS normalization for token magnitude
Relative positional encoding with shared encoder
🔎 Similar Papers
No similar papers found.