An end-to-end attention-based approach for learning on graphs

📅 2024-02-16
🏛️ Nature Communications
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph Transformers suffer from limited effectiveness, poor scalability, and high preprocessing complexity, often failing to outperform simple GNNs. To address this, we propose the first pure-attention graph learning framework that treats edge sets—not nodes—as the fundamental modeling unit, eliminating conventional node-centric representations and hand-crafted message passing. Our method introduces vertically interleaved masked and standard self-attention encoders, coupled with attention-based pooling for end-to-end differentiable training. It requires no graph reconstruction or preprocessing, natively supports heterogeneous graphs and transfer learning. Evaluated across 70+ node- and graph-level benchmark tasks, our approach consistently surpasses tuned GNN baselines and state-of-the-art graph Transformers. It achieves new SOTA results on molecular graph classification, vision-based graph recognition, heterogeneous graph learning, and cross-domain transfer, while maintaining both high accuracy and linear scalability.

Technology Category

Application Category

📝 Abstract
There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede the hand-crafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To address these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representation of edges while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings and scales much better than alternatives with a similar performance level or expressive power.
Problem

Research questions and friction points this paper is trying to address.

Addressing scalability and complexity in graph transformer models
Improving graph learning with attention-based edge representations
Enhancing performance across diverse node and graph-level tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based edge set encoder
Masked and vanilla self-attention interleaving
Scalable outperforming GNNs and transformers
🔎 Similar Papers
No similar papers found.
D
David Buterez
Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
J
J. Janet
Molecular AI, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
Dino Oglic
Dino Oglic
AstraZeneca Cambridge
Machine LearningKernel MethodsLearning TheoryRepresentation LearningDrug Design
P
Pietro Lió
Department of Computer Science and Technology, University of Cambridge, Cambridge, UK