Do traveling waves make good positional encodings?

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the limitation of absolute positional encodings in Transformers—namely, their inability to effectively model relative positional relationships—this paper proposes RollPE, a novel positional encoding grounded in the physical analogy of traveling waves. RollPE explicitly constructs position-difference-driven relative phase shifts by applying cyclic shifts to query and key tensors, then generates wave-like encodings via sinusoidal functions. Theoretically, it is the first work to formalize positional encoding as a traveling-wave propagation process; we rigorously prove its mathematical equivalence to RoPE and derive its continuous formulation, which implicitly endows the query-key space with a topological structure. Empirically, RollPE matches RoPE’s performance on standard benchmarks while substantially outperforming conventional absolute encodings. Moreover, it offers a more concise theoretical foundation and enhanced neuroscientific interpretability.

Technology Category

Application Category

📝 Abstract

Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positional vectors, while more recent methods emphasize relative encodings to better capture translation equivariances. In this work, we propose RollPE, a novel positional encoding mechanism based on traveling waves, implemented by applying a circular roll operation to the query and key tensors in self-attention. This operation induces a relative shift in phase across positions, allowing the model to compute attention as a function of positional differences rather than absolute indices. We show this simple method significantly outperforms traditional absolute positional embeddings and is comparable to RoPE. We derive a continuous case of RollPE which implicitly imposes a topographic structure on the query and key space. We further derive a mathematical equivalence of RollPE to a particular configuration of RoPE. Viewing RollPE through the lens of traveling waves may allow us to simplify RoPE and relate it to processes of information flow in the brain.

Problem

Research questions and friction points this paper is trying to address.

Proposing RollPE as traveling wave-based positional encoding for transformers

Enhancing relative positional awareness through circular shift operations

Establishing mathematical equivalence between RollPE and RoPE configurations

Innovation

Methods, ideas, or system contributions that make the work stand out.

RollPE uses circular roll for positional encoding

It computes attention based on positional differences

It imposes topographic structure on query space

🔎 Similar Papers

No similar papers found.