ParaFormer: A Generalized PageRank Graph Transformer for Graph Representation Learning

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Graph Transformers (GTs) mitigate over-smoothing in GNNs via global attention but suffer from severe over-smoothing themselves, degrading node representations. To address this, we propose a PageRank-enhanced attention mechanism—the first to theoretically embed PageRank into Transformer architectures—yielding a graph-structure-aware adaptive bandpass filter that overcomes the inherent low-pass limitation of conventional GTs while preserving both global context and hierarchical structural information. Our method comprises four components: PageRank-guided sparse attention, bandpass spectral filtering, structure-aware positional encoding, and a linear-complexity implementation. Extensive experiments across 11 diverse graph datasets—from thousands to millions of nodes—demonstrate consistent and significant improvements over state-of-the-art methods on both node and graph classification tasks. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Graph Transformers (GTs) have emerged as a promising graph learning tool, leveraging their all-pair connected property to effectively capture global information. To address the over-smoothing problem in deep GNNs, global attention was initially introduced, eliminating the necessity for using deep GNNs. However, through empirical and theoretical analysis, we verify that the introduced global attention exhibits severe over-smoothing, causing node representations to become indistinguishable due to its inherent low-pass filtering. This effect is even stronger than that observed in GNNs. To mitigate this, we propose PageRank Transformer (ParaFormer), which features a PageRank-enhanced attention module designed to mimic the behavior of deep Transformers. We theoretically and empirically demonstrate that ParaFormer mitigates over-smoothing by functioning as an adaptive-pass filter. Experiments show that ParaFormer achieves consistent performance improvements across both node classification and graph classification tasks on 11 datasets ranging from thousands to millions of nodes, validating its efficacy. The supplementary material, including code and appendix, can be found in https://github.com/chaohaoyuan/ParaFormer.

Problem

Research questions and friction points this paper is trying to address.

Addresses over-smoothing in Graph Transformers via adaptive-pass filtering

Proposes PageRank-enhanced attention to mimic deep Transformers' behavior

Improves performance on node and graph classification across diverse datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

PageRank-enhanced attention module for adaptive filtering

Mitigates over-smoothing in Graph Transformers effectively

Improves performance on node and graph classification tasks

🔎 Similar Papers

Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers

2024-06-27Neural Information Processing SystemsCitations: 0

Graph Transformers: A Survey

2024-07-13arXiv.orgCitations: 36

Google

$207,000-$300,000 + bonus + equity + benefits.

Mountain View, CA, USA

Research Engineer, Monetization AI