🤖 AI Summary
Existing graph transformers (GTs) suffer from two key bottlenecks: (1) entanglement of multi-source information—positional, structural, and attribute features—leading to poor interpretability and inflexible design; and (2) conflation of local message passing with global self-attention, causing overfitting and weakened local feature representation. To address these issues, we propose the Graph Attention Tripartite Decoupling (GATD) framework, the first to orthogonally decompose self-attention into three dedicated modules—positional, structural, and attribute—while hierarchically modeling local propagation and global interaction. Our modular computation and adaptive fusion mechanism enable multi-view decoupled representation learning and dynamic local–global coordination. GATD achieves state-of-the-art performance on node and graph classification across multiple benchmark datasets. Ablation studies confirm that the tripartite decoupling significantly enhances model interpretability, generalization, and architectural flexibility.
📝 Abstract
Graph Transformers (GTs) have recently achieved significant success in the graph domain by effectively capturing both long-range dependencies and graph inductive biases. However, these methods face two primary challenges: (1) multi-view chaos, which results from coupling multi-view information (positional, structural, attribute), thereby impeding flexible usage and the interpretability of the propagation process. (2) local-global chaos, which arises from coupling local message passing with global attention, leading to issues of overfitting and over-globalizing. To address these challenges, we propose a high-level decoupled perspective of GTs, breaking them down into three components and two interaction levels: positional attention, structural attention, and attribute attention, alongside local and global interaction. Based on this decoupled perspective, we design a decoupled graph triple attention network named DeGTA, which separately computes multi-view attentions and adaptively integrates multi-view local and global information. This approach offers three key advantages: enhanced interpretability, flexible design, and adaptive integration of local and global information. Through extensive experiments, DeGTA achieves state-of-the-art performance across various datasets and tasks, including node classification and graph classification. Comprehensive ablation studies demonstrate that decoupling is essential for improving performance and enhancing interpretability. Our code is available at: https://github.com/wangxiaotang0906/DeGTA