Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Dynamic graph temporal link prediction suffers from the quadratic time complexity of Transformer self-attention, severely limiting scalability in high-frequency and large-scale scenarios. To address this, we propose GLFormer—a Transformer-style architecture devoid of self-attention. It replaces self-attention with an adaptive token mixer, introduces a context-aware local aggregation mechanism that jointly models interaction order and temporal intervals, and employs hierarchical temporal aggregation to capture long-range dependencies. The entire model is a pure MLP-based architecture incorporating learnable positional encoding and feed-forward networks. Evaluated on six mainstream dynamic graph benchmarks, GLFormer achieves state-of-the-art performance while significantly improving training and inference efficiency. Experimental results demonstrate that attention-free architectures can effectively, efficiently, and scalably model dynamic graphs—offering a compelling alternative to attention-centric designs.

Technology Category

Application Category

📝 Abstract

Dynamic graph learning plays a pivotal role in modeling evolving relationships over time, especially for temporal link prediction tasks in domains such as traffic systems, social networks, and recommendation platforms. While Transformer-based models have demonstrated strong performance by capturing long-range temporal dependencies, their reliance on self-attention results in quadratic complexity with respect to sequence length, limiting scalability on high-frequency or large-scale graphs. In this work, we revisit the necessity of self-attention in dynamic graph modeling. Inspired by recent findings that attribute the success of Transformers more to their architectural design than attention itself, we propose GLFormer, a novel attention-free Transformer-style framework for dynamic graphs. GLFormer introduces an adaptive token mixer that performs context-aware local aggregation based on interaction order and time intervals. To capture long-term dependencies, we further design a hierarchical aggregation module that expands the temporal receptive field by stacking local token mixers across layers. Experiments on six widely-used dynamic graph benchmarks show that GLFormer achieves SOTA performance, which reveals that attention-free architectures can match or surpass Transformer baselines in dynamic graph settings with significantly improved efficiency.

Problem

Research questions and friction points this paper is trying to address.

Proposes attention-free Transformer for dynamic graph link prediction

Addresses quadratic complexity limitation in temporal dependency modeling

Enables efficient long-range dependency capture in evolving graph relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-free Transformer framework for dynamic graphs

Adaptive token mixer with local aggregation

Hierarchical aggregation for long-term dependencies

🔎 Similar Papers

Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors