🤖 AI Summary
To address the limitation of absolute positional encodings in Transformers—namely, their inability to effectively model relative positional relationships—this paper proposes RollPE, a novel positional encoding grounded in the physical analogy of traveling waves. RollPE explicitly constructs position-difference-driven relative phase shifts by applying cyclic shifts to query and key tensors, then generates wave-like encodings via sinusoidal functions. Theoretically, it is the first work to formalize positional encoding as a traveling-wave propagation process; we rigorously prove its mathematical equivalence to RoPE and derive its continuous formulation, which implicitly endows the query-key space with a topological structure. Empirically, RollPE matches RoPE’s performance on standard benchmarks while substantially outperforming conventional absolute encodings. Moreover, it offers a more concise theoretical foundation and enhanced neuroscientific interpretability.
📝 Abstract
Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positional vectors, while more recent methods emphasize relative encodings to better capture translation equivariances. In this work, we propose RollPE, a novel positional encoding mechanism based on traveling waves, implemented by applying a circular roll operation to the query and key tensors in self-attention. This operation induces a relative shift in phase across positions, allowing the model to compute attention as a function of positional differences rather than absolute indices. We show this simple method significantly outperforms traditional absolute positional embeddings and is comparable to RoPE. We derive a continuous case of RollPE which implicitly imposes a topographic structure on the query and key space. We further derive a mathematical equivalence of RollPE to a particular configuration of RoPE. Viewing RollPE through the lens of traveling waves may allow us to simplify RoPE and relate it to processes of information flow in the brain.