Group Representational Position Encoding

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing positional encodings (e.g., RoPE, ALiBi) suffer from geometric rigidity and functional limitations in long-context modeling. To address this, we propose GRAPE—the first unified positional encoding framework grounded in group action theory. GRAPE seamlessly integrates multiplicative rotation (SO(d) group action) and additive logit bias (unipotent GL-group action), jointly modeling relative positions while enabling streaming KV caching and context-length extension. Innovatively, it introduces learnable commuting subspaces and a non-commutative hybrid architecture to enable cross-subspace feature coupling. By leveraging closed-form matrix exponentials and low-rank decomposition, GRAPE balances expressivity and computational efficiency. It exactly recovers RoPE, ALiBi, and FoX as special cases, with controllable complexity ranging from O(d) to O(rd). Empirically, GRAPE significantly improves long-sequence modeling performance while preserving relative position awareness and cache efficiency.

Technology Category

Application Category

📝 Abstract

We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $mathrm{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $mathrm{GL}$. In Multiplicative GRAPE, a position $n in mathbb{Z}$ (or $t in mathbb{R}$) acts as $mathbf{G}(n)=exp(n,ω,mathbf{L})$ with a rank-2 skew generator $mathbf{L} in mathbb{R}^{d imes d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE.

Problem

Research questions and friction points this paper is trying to address.

Unifies positional encoding via group actions framework.

Extends RoPE and ALiBi with learned geometries.

Enables efficient long-context modeling with streaming.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified positional encoding via group actions framework

Multiplicative rotations in SO(d) with compositional norm-preserving maps

Additive logit biases from unipotent GL actions for streaming

🔎 Similar Papers

No similar papers found.