🤖 AI Summary
Existing graph neural networks and graph Transformers exhibit limitations in modeling long-range dependencies and generalizing across distributions. This work proposes the Graph Hierarchical Recursion (GHR) framework, which jointly learns representations from the original graph and its hierarchically pooled counterparts while incorporating a recursive mechanism. This design significantly enhances long-range dependency modeling and out-of-distribution generalization, all while maintaining architectural simplicity. Remarkably, GHR achieves superior performance on multiple long-range benchmark tasks using only approximately 1% of the parameters required by current state-of-the-art models, demonstrating exceptional parameter efficiency and performance advantages.
📝 Abstract
Graph Neural Networks (GNNs) and Graph Transformers (GTs) are now a fundamental paradigm for graph learning, combining the representation-learning capabilities of deep models with the sample efficiency induced by their inductive biases. Despite their effectiveness, a large body of work has shown that these models still face fundamental limitations in tasks that require capturing correlations between distant regions of a graph. To address this issue, we introduce Graph Hierarchical Recurrence (GHR), a novel framework that operates jointly on the input graph and on a hierarchical abstraction obtained through pooling. We also show that the limitations of existing models are even more pronounced in out-of-range generalization, where test instances involve interactions over distances longer than those observed during training. By contrast, despite its simple design, GHR provides three key advantages: strong performance on long-range dependencies, improved out-of-range generalization, and high parameter efficiency. To corroborate these claims, we show that across a broad set of long-range benchmarks, GHR consistently outperforms existing graph models while using as little as 1% of the parameters of current state-of-the-art models. These results suggest a complementary direction to the current trend of scaling architectures to obtain graph foundation models, indicating that increased model capacity alone may not be sufficient for generalization.