🤖 AI Summary
RoPE exhibits limitations in modeling multidimensional data (e.g., 2D/3D images), including dimensional constraints and insufficient positional expressiveness. To address this, we propose LieRE—the first rotation-based positional encoding framework generalized from 1D sequences to high-dimensional spaces. Leveraging Lie group theory, LieRE constructs learnable, sparse high-dimensional orthogonal rotation matrices to explicitly encode generalized relative positional relationships. It further incorporates dynamic sparsity regularization and efficient complex-valued tensor operations to balance representational capacity and computational efficiency. On 2D and 3D image classification benchmarks, LieRE achieves +2.0% and +1.5% accuracy improvements over prior state-of-the-art methods, respectively, while demonstrating superior generalization to high-resolution inputs. Notably, training on CIFAR-100 requires only four A100 GPUs and 30 minutes, underscoring its strong performance–efficiency trade-off.
📝 Abstract
Transformer architectures rely on position encodings to capture token dependencies. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which replaces RoPE's block-2D rotation matrix with a learned, dense, high-dimensional rotation matrix of variable sparsity. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 2% relative improvement over state-of-the-art baselines on 2D tasks and 1.5% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100, and we release our code to facilitate further research.