🤖 AI Summary
Dynamic graph temporal link prediction suffers from the quadratic time complexity of Transformer self-attention, severely limiting scalability in high-frequency and large-scale scenarios. To address this, we propose GLFormer—a Transformer-style architecture devoid of self-attention. It replaces self-attention with an adaptive token mixer, introduces a context-aware local aggregation mechanism that jointly models interaction order and temporal intervals, and employs hierarchical temporal aggregation to capture long-range dependencies. The entire model is a pure MLP-based architecture incorporating learnable positional encoding and feed-forward networks. Evaluated on six mainstream dynamic graph benchmarks, GLFormer achieves state-of-the-art performance while significantly improving training and inference efficiency. Experimental results demonstrate that attention-free architectures can effectively, efficiently, and scalably model dynamic graphs—offering a compelling alternative to attention-centric designs.
📝 Abstract
Dynamic graph learning plays a pivotal role in modeling evolving relationships over time, especially for temporal link prediction tasks in domains such as traffic systems, social networks, and recommendation platforms. While Transformer-based models have demonstrated strong performance by capturing long-range temporal dependencies, their reliance on self-attention results in quadratic complexity with respect to sequence length, limiting scalability on high-frequency or large-scale graphs. In this work, we revisit the necessity of self-attention in dynamic graph modeling. Inspired by recent findings that attribute the success of Transformers more to their architectural design than attention itself, we propose GLFormer, a novel attention-free Transformer-style framework for dynamic graphs. GLFormer introduces an adaptive token mixer that performs context-aware local aggregation based on interaction order and time intervals. To capture long-term dependencies, we further design a hierarchical aggregation module that expands the temporal receptive field by stacking local token mixers across layers. Experiments on six widely-used dynamic graph benchmarks show that GLFormer achieves SOTA performance, which reveals that attention-free architectures can match or surpass Transformer baselines in dynamic graph settings with significantly improved efficiency.