🤖 AI Summary
Graph Transformers (GTs) suffer from diluted local neighborhood information due to global attention, leading to incomplete graph representations. To address this, we propose G2LFormer, the first GT architecture adopting a “global-to-local” attention paradigm: shallow layers capture long-range dependencies, while deeper layers progressively focus on fine-grained local structures; a cross-layer feature fusion mechanism further mitigates representation degradation. G2LFormer integrates linear-complexity graph attention with dedicated GNN modules to enable efficient, synergistic global–local modeling. Evaluated on node and graph classification benchmarks, G2LFormer consistently outperforms state-of-the-art linear-time GTs and classical GNNs. Crucially, it achieves this improvement while maintaining strict O(N) time complexity—enhancing both representational completeness and discriminative power of learned graph embeddings.
📝 Abstract
Graph Transformers (GTs) show considerable potential in graph representation learning. The architecture of GTs typically integrates Graph Neural Networks (GNNs) with global attention mechanisms either in parallel or as a precursor to attention mechanisms, yielding a local-and-global or local-to-global attention scheme. However, as the global attention mechanism primarily captures long-range dependencies between nodes, these integration schemes may suffer from information loss, where the local neighborhood information learned by GNN could be diluted by the attention mechanism. Therefore, we propose G2LFormer, featuring a novel global-to-local attention scheme where the shallow network layers use attention mechanisms to capture global information, while the deeper layers employ GNN modules to learn local structural information, thereby preventing nodes from ignoring their immediate neighbors. An effective cross-layer information fusion strategy is introduced to allow local layers to retain beneficial information from global layers and alleviate information loss, with acceptable trade-offs in scalability. To validate the feasibility of the global-to-local attention scheme, we compare G2LFormer with state-of-the-art linear GTs and GNNs on node-level and graph-level tasks. The results indicate that G2LFormer exhibits excellent performance while keeping linear complexity.