LieRE: Generalizing Rotary Position Encodings

📅 2024-06-14

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

RoPE exhibits limitations in modeling multidimensional data (e.g., 2D/3D images), including dimensional constraints and insufficient positional expressiveness. To address this, we propose LieRE—the first rotation-based positional encoding framework generalized from 1D sequences to high-dimensional spaces. Leveraging Lie group theory, LieRE constructs learnable, sparse high-dimensional orthogonal rotation matrices to explicitly encode generalized relative positional relationships. It further incorporates dynamic sparsity regularization and efficient complex-valued tensor operations to balance representational capacity and computational efficiency. On 2D and 3D image classification benchmarks, LieRE achieves +2.0% and +1.5% accuracy improvements over prior state-of-the-art methods, respectively, while demonstrating superior generalization to high-resolution inputs. Notably, training on CIFAR-100 requires only four A100 GPUs and 30 minutes, underscoring its strong performance–efficiency trade-off.

Technology Category

Application Category

📝 Abstract

Transformer architectures rely on position encodings to capture token dependencies. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which replaces RoPE's block-2D rotation matrix with a learned, dense, high-dimensional rotation matrix of variable sparsity. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 2% relative improvement over state-of-the-art baselines on 2D tasks and 1.5% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100, and we release our code to facilitate further research.

Problem

Research questions and friction points this paper is trying to address.

Enhances position encoding in Transformers

Extends RoPE to multi-dimensional data

Improves generalization and efficiency in image tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

LieRE replaces RoPE's rotation matrix

Learned, dense, high-dimensional rotation matrix

Superior generalization to higher resolutions

🔎 Similar Papers

Round and Round We Go! What makes Rotary Positional Encodings useful?