Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs

๐Ÿ“… 2025-02-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the interpretability of Graph Neural Networks (GNNs) and Graph Transformers, aiming to uncover information flow mechanisms between nodes and elucidate how learned structural representations relate to the original graph topology. To this end, we propose Attention Graphsโ€”a method that aggregates multi-layer, multi-head self-attention matrices to construct an explicit information propagation graph, unifying the message-passing characterization across both model families. Key contributions include: (i) establishing the first formal mathematical equivalence between GNN message passing and Transformer self-attention; (ii) revealing markedly distinct information-flow patterns among high-performing models on heterogeneous graphs, despite comparable accuracy; and (iii) demonstrating that under fully connected attention, learned structures exhibit weak correlation with the original graph. Extensive experiments on benchmark datasets validate that Attention Graphs effectively expose model inductive biases. The implementation is publicly available to ensure reproducibility.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce Attention Graphs, a new tool for mechanistic interpretability of Graph Neural Networks (GNNs) and Graph Transformers based on the mathematical equivalence between message passing in GNNs and the self-attention mechanism in Transformers. Attention Graphs aggregate attention matrices across Transformer layers and heads to describe how information flows among input nodes. Through experiments on homophilous and heterophilous node classification tasks, we analyze Attention Graphs from a network science perspective and find that: (1) When Graph Transformers are allowed to learn the optimal graph structure using all-to-all attention among input nodes, the Attention Graphs learned by the model do not tend to correlate with the input/original graph structure; and (2) For heterophilous graphs, different Graph Transformer variants can achieve similar performance while utilising distinct information flow patterns. Open source code: https://github.com/batu-el/understanding-inductive-biases-of-gnns
Problem

Research questions and friction points this paper is trying to address.

interpretability of Graph Transformers
information flow in neural networks
performance in heterophilous graphs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Graphs interpret Graph Transformers
Aggregates attention matrices across layers
Analyzes information flow in node classification
๐Ÿ”Ž Similar Papers
No similar papers found.