Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dynamic graph temporal link prediction suffers from the quadratic time complexity of Transformer self-attention, severely limiting scalability in high-frequency and large-scale scenarios. To address this, we propose GLFormer—a Transformer-style architecture devoid of self-attention. It replaces self-attention with an adaptive token mixer, introduces a context-aware local aggregation mechanism that jointly models interaction order and temporal intervals, and employs hierarchical temporal aggregation to capture long-range dependencies. The entire model is a pure MLP-based architecture incorporating learnable positional encoding and feed-forward networks. Evaluated on six mainstream dynamic graph benchmarks, GLFormer achieves state-of-the-art performance while significantly improving training and inference efficiency. Experimental results demonstrate that attention-free architectures can effectively, efficiently, and scalably model dynamic graphs—offering a compelling alternative to attention-centric designs.

Technology Category

Application Category

📝 Abstract
Dynamic graph learning plays a pivotal role in modeling evolving relationships over time, especially for temporal link prediction tasks in domains such as traffic systems, social networks, and recommendation platforms. While Transformer-based models have demonstrated strong performance by capturing long-range temporal dependencies, their reliance on self-attention results in quadratic complexity with respect to sequence length, limiting scalability on high-frequency or large-scale graphs. In this work, we revisit the necessity of self-attention in dynamic graph modeling. Inspired by recent findings that attribute the success of Transformers more to their architectural design than attention itself, we propose GLFormer, a novel attention-free Transformer-style framework for dynamic graphs. GLFormer introduces an adaptive token mixer that performs context-aware local aggregation based on interaction order and time intervals. To capture long-term dependencies, we further design a hierarchical aggregation module that expands the temporal receptive field by stacking local token mixers across layers. Experiments on six widely-used dynamic graph benchmarks show that GLFormer achieves SOTA performance, which reveals that attention-free architectures can match or surpass Transformer baselines in dynamic graph settings with significantly improved efficiency.
Problem

Research questions and friction points this paper is trying to address.

Proposes attention-free Transformer for dynamic graph link prediction
Addresses quadratic complexity limitation in temporal dependency modeling
Enables efficient long-range dependency capture in evolving graph relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-free Transformer framework for dynamic graphs
Adaptive token mixer with local aggregation
Hierarchical aggregation for long-term dependencies
🔎 Similar Papers
No similar papers found.
Tao Zou
Tao Zou
The Australian National University
Covariance regressionnetwork data modelingenvironmental statisticsfinancial statistics
C
Chengfeng Wu
Shenzhen Key Laboratory of Ubiquitous Data Enabling, Tsinghua Shenzhen International Graduate School, Shenzhen, China
T
Tianxi Liao
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China
J
Junchen Ye
School of Transportation Science and Engineering, Beihang University, Beijing, China
Bowen Du
Bowen Du
Beihang University