An end-to-end attention-based approach for learning on graphs

📅 2024-02-16

🏛️ Nature Communications

📈 Citations: 2

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing graph Transformers suffer from limited effectiveness, poor scalability, and high preprocessing complexity, often failing to outperform simple GNNs. To address this, we propose the first pure-attention graph learning framework that treats edge sets—not nodes—as the fundamental modeling unit, eliminating conventional node-centric representations and hand-crafted message passing. Our method introduces vertically interleaved masked and standard self-attention encoders, coupled with attention-based pooling for end-to-end differentiable training. It requires no graph reconstruction or preprocessing, natively supports heterogeneous graphs and transfer learning. Evaluated across 70+ node- and graph-level benchmark tasks, our approach consistently surpasses tuned GNN baselines and state-of-the-art graph Transformers. It achieves new SOTA results on molecular graph classification, vision-based graph recognition, heterogeneous graph learning, and cross-domain transfer, while maintaining both high accuracy and linear scalability.

Technology Category

Application Category

📝 Abstract

There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede the hand-crafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To address these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representation of edges while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings and scales much better than alternatives with a similar performance level or expressive power.

Problem

Research questions and friction points this paper is trying to address.

Addressing scalability and complexity in graph transformer models

Improving graph learning with attention-based edge representations

Enhancing performance across diverse node and graph-level tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based edge set encoder

Masked and vanilla self-attention interleaving

Scalable outperforming GNNs and transformers

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations