🤖 AI Summary
Existing graph Transformers face two key challenges in heterogeneous graph learning: (i) quadratic computational complexity hinders scalability, and (ii) difficulty in disentangling heterogeneous semantics across multiple node types. To address these, we propose the Slot-aware Retention Network (SRN), which introduces a novel slot-aware structural encoder. It decouples type-specific semantics via slot normalization and retention-based fusion—avoiding semantic confusion caused by forced feature-space unification. SRN replaces self-attention with a linear-complexity retention mechanism, augmented by slot projection, distribution alignment, and multi-scale heterogeneous retention layers to jointly capture local structural patterns and global heterogeneous semantics. Evaluated on four real-world heterogeneous graph benchmarks, SRN achieves state-of-the-art performance on node classification—outperforming mainstream GNNs and graph Transformers in both accuracy and computational efficiency.
📝 Abstract
Graph Transformers have recently achieved remarkable progress in graph representation learning by capturing long-range dependencies through self-attention. However, their quadratic computational complexity and inability to effectively model heterogeneous semantics severely limit their scalability and generalization on real-world heterogeneous graphs. To address these issues, we propose HeSRN, a novel Heterogeneous Slot-aware Retentive Network for efficient and expressive heterogeneous graph representation learning. HeSRN introduces a slot-aware structure encoder that explicitly disentangles node-type semantics by projecting heterogeneous features into independent slots and aligning their distributions through slot normalization and retention-based fusion, effectively mitigating the semantic entanglement caused by forced feature-space unification in previous Transformer-based models. Furthermore, we replace the self-attention mechanism with a retention-based encoder, which models structural and contextual dependencies in linear time complexity while maintaining strong expressive power. A heterogeneous retentive encoder is further employed to jointly capture both local structural signals and global heterogeneous semantics through multi-scale retention layers. Extensive experiments on four real-world heterogeneous graph datasets demonstrate that HeSRN consistently outperforms state-of-the-art heterogeneous graph neural networks and Graph Transformer baselines on node classification tasks, achieving superior accuracy with significantly lower computational complexity.