๐ค AI Summary
This work proposes HopFormer, a novel graph Transformer that addresses the high computational cost and limited receptive field control of conventional approaches relying on dense global attention and explicit positional encodings. HopFormer introduces, for the first time, a head-specific n-hop masked sparse attention mechanism that explicitly models graph structural information and enables precise control over the receptive fieldโwithout requiring positional encodings or architectural modifications. The method achieves linear scalability and computational efficiency while delivering competitive or superior performance across diverse node-level and graph-level benchmarks. Notably, it demonstrates that localized attention is more stable and effective than global attention in small-world graphs, challenging the prevailing assumption that graph Transformers inherently depend on global interactions.
๐ Abstract
Graph Transformers typically rely on explicit positional or structural encodings and dense global attention to incorporate graph topology. In this work, we show that neither is essential. We introduce HopFormer, a graph Transformer that injects structure exclusively through head-specific n-hop masked sparse attention, without the use of positional encodings or architectural modifications. This design provides explicit and interpretable control over receptive fields while enabling genuinely sparse attention whose computational cost scales linearly with mask sparsity. Through extensive experiments on both node-level and graph-level benchmarks, we demonstrate that our approach achieves competitive or superior performance across diverse graph structures. Our results further reveal that dense global attention is often unnecessary: on graphs with strong small-world properties, localized attention yields more stable and consistently high performance, while on graphs with weaker small-world effects, global attention offers diminishing returns. Together, these findings challenge prevailing assumptions in graph Transformer design and highlight sparsity-controlled attention as a principled and efficient alternative.