VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the high computational complexity of existing graph Transformer models and their limited out-of-distribution (OOD) generalization due to node-level attention mechanisms. To overcome these challenges, the authors propose VecFormer, a novel approach that introduces a two-stage training paradigm. In the first stage, a vector-quantized codebook is employed to jointly reconstruct node features and graph structure, yielding semantically rich graph representations. In the second stage, attention is computed at the level of graph tokens rather than individual nodes. This design substantially reduces computational overhead while enhancing OOD generalization. Experimental results demonstrate that VecFormer achieves both faster inference and superior performance on node classification tasks across datasets of varying scales.

Technology Category

Application Category

📝 Abstract

Graph Transformer has demonstrated impressive capabilities in the field of graph representation learning. However, existing approaches face two critical challenges: (1) most models suffer from exponentially increasing computational complexity, making it difficult to scale to large graphs; (2) attention mechanisms based on node-level operations limit the flexibility of the model and result in poor generalization performance in out-of-distribution (OOD) scenarios. To address these issues, we propose \textbf{VecFormer} (the \textbf{Vec}tor Quantized Graph Trans\textbf{former}), an efficient and highly generalizable model for node classification, particularly under OOD settings. VecFormer adopts a two-stage training paradigm. In the first stage, two codebooks are used to reconstruct the node features and the graph structure, aiming to learn the rich semantic \texttt{Graph Codes}. In the second stage, attention mechanisms are performed at the \texttt{Graph Token} level based on the transformed cross codebook, reducing computational complexity while enhancing the model's generalization capability. Extensive experiments on datasets of various sizes demonstrate that VecFormer outperforms the existing Graph Transformer in both performance and speed.

Problem

Research questions and friction points this paper is trying to address.

Graph Transformer

computational complexity

out-of-distribution generalization

attention mechanism

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Transformer

Vector Quantization

Graph Token Attention