Efficient Matrix Implementation for Rotary Position Embedding

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the computational redundancy and suboptimal hardware utilization of existing rotary position encoding (RoPE) methods, which rely on vector-level splitting and merging operations that become inefficient in multidimensional settings. The paper proposes RoME, the first approach to unify and reformulate RoPE as a matrix transformation, eliminating dimension-dependent operations while preserving mathematical equivalence. This reformulation simplifies implementation and enables fused parallel execution across both Cube and Vector units on modern neural processing units (NPUs). Experimental results demonstrate that RoME achieves significant acceleration at both operator and full-model levels, substantially improving inference efficiency and hardware compatibility of Transformers across language, vision, and 3D tasks.

Technology Category

Application Category

📝 Abstract

Rotary Position Embedding (RoPE) has become a core component of modern Transformer architectures across language, vision, and 3D domains. However, existing implementations rely on vector-level split and merge operations that introduce non-negligible computational overhead, often overlooked in attention optimization. The problem is further amplified in multi-dimensional settings (e.g., 2D and 3D RoPE), where additional vector operations and uneven feature partitions degrade hardware utilization. To overcome these limitations, we propose RoME (Rotary Matrix position Embedding), a mathematically equivalent yet computationally efficient reformulation of RoPE that replaces vector operations with unified matrix transformations. RoME eliminates dimension-specific operations, simplifies implementation, and enables fused parallel execution across Cube and Vector units on modern NPUs. Experiments show that RoME delivers substantial acceleration at both the operator and full-model levels. The implementation is available at https://gitcode.com/cann/ops-transformer/blob/master/experimental/posembedding/rope_matrix/README.md.

Problem

Research questions and friction points this paper is trying to address.

Rotary Position Embedding

computational overhead

multi-dimensional RoPE

hardware utilization

vector operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Position Embedding

Matrix Transformation

Efficient Implementation