Transformers are Graph Neural Networks

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work investigates the theoretical connections between Transformers and Graph Neural Networks (GNNs), challenging the conventional view of Transformers as purely sequence-based models. Method: We formalize the Transformer as a message-passing GNN operating on a fully connected token graph, where self-attention implements dynamic, content-aware neighborhood aggregation and positional encodings implicitly encode structural priors. We reinterpret Transformer computation within a unified message-passing framework and introduce the concept of the “hardware lottery”—highlighting that its efficiency stems not only from architectural design but also from hardware-level optimizations for dense matrix operations in modern accelerators. Contribution/Results: (1) We establish the first rigorous theoretical bridge between NLP and graph representation learning; (2) we identify the dual origins of Transformer expressivity—structural modeling capacity and hardware alignment; and (3) we propose a new paradigm for interpretable modeling and hardware-aware neural architecture design.

Technology Category

Application Category

📝 Abstract

We establish connections between the Transformer architecture, originally introduced for natural language processing, and Graph Neural Networks (GNNs) for representation learning on graphs. We show how Transformers can be viewed as message passing GNNs operating on fully connected graphs of tokens, where the self-attention mechanism capture the relative importance of all tokens w.r.t. each-other, and positional encodings provide hints about sequential ordering or structure. Thus, Transformers are expressive set processing networks that learn relationships among input elements without being constrained by apriori graphs. Despite this mathematical connection to GNNs, Transformers are implemented via dense matrix operations that are significantly more efficient on modern hardware than sparse message passing. This leads to the perspective that Transformers are GNNs currently winning the hardware lottery.

Problem

Research questions and friction points this paper is trying to address.

Connecting Transformers and Graph Neural Networks

Viewing Transformers as message passing GNNs

Transformers' efficiency due to dense matrix operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers as message passing GNNs

Self-attention captures token relationships

Dense matrix operations enhance efficiency

🔎 Similar Papers

Graph Transformers: A Survey