Do traveling waves make good positional encodings?

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitation of absolute positional encodings in Transformers—namely, their inability to effectively model relative positional relationships—this paper proposes RollPE, a novel positional encoding grounded in the physical analogy of traveling waves. RollPE explicitly constructs position-difference-driven relative phase shifts by applying cyclic shifts to query and key tensors, then generates wave-like encodings via sinusoidal functions. Theoretically, it is the first work to formalize positional encoding as a traveling-wave propagation process; we rigorously prove its mathematical equivalence to RoPE and derive its continuous formulation, which implicitly endows the query-key space with a topological structure. Empirically, RollPE matches RoPE’s performance on standard benchmarks while substantially outperforming conventional absolute encodings. Moreover, it offers a more concise theoretical foundation and enhanced neuroscientific interpretability.

Technology Category

Application Category

📝 Abstract
Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positional vectors, while more recent methods emphasize relative encodings to better capture translation equivariances. In this work, we propose RollPE, a novel positional encoding mechanism based on traveling waves, implemented by applying a circular roll operation to the query and key tensors in self-attention. This operation induces a relative shift in phase across positions, allowing the model to compute attention as a function of positional differences rather than absolute indices. We show this simple method significantly outperforms traditional absolute positional embeddings and is comparable to RoPE. We derive a continuous case of RollPE which implicitly imposes a topographic structure on the query and key space. We further derive a mathematical equivalence of RollPE to a particular configuration of RoPE. Viewing RollPE through the lens of traveling waves may allow us to simplify RoPE and relate it to processes of information flow in the brain.
Problem

Research questions and friction points this paper is trying to address.

Proposing RollPE as traveling wave-based positional encoding for transformers
Enhancing relative positional awareness through circular shift operations
Establishing mathematical equivalence between RollPE and RoPE configurations
Innovation

Methods, ideas, or system contributions that make the work stand out.

RollPE uses circular roll for positional encoding
It computes attention based on positional differences
It imposes topographic structure on query space
🔎 Similar Papers
No similar papers found.
C
Chase van de Geijn
Institute of Computer Science and Campus Institute Data Science, University of Göttingen
A
Ayush Paliwal
Institute of Computer Science and Campus Institute Data Science, University of Göttingen; Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
T
Timo Lüddecker
Institute of Computer Science and Campus Institute Data Science, University of Göttingen
Alexander S. Ecker
Alexander S. Ecker
University of Göttingen, Germany
Computational NeuroscienceVisionMachine LearningComputer VisionData Science