CRoPE: Efficient Parametrization of Rotary Positional Embedding

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the parameter redundancy in conventional Rotary Position Embedding (RoPE) within the query, key, and value projections, which fails to fully exploit the expressive efficiency of complex linear transformations. The authors propose a novel RoPE reformulation grounded in genuine complex linear mappings, seamlessly integrating positional encoding into linear transformations in the complex domain. This approach substantially reduces the number of parameters in the attention module by nearly 50% while preserving model performance almost unchanged, thereby enhancing representational conciseness and interpretability. Empirical evaluations demonstrate that the parameter reduction incurs negligible performance degradation on both in-distribution and out-of-distribution tasks, confirming the method’s superiority in parameter efficiency.

Technology Category

Application Category

📝 Abstract

Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear algebra, we note that the actual implementation of $Q/K/V$-projections is not equivalent to a complex linear transformation. We argue that complex linear transformation is a more natural parametrization and saves near 50\% parameters within the attention block. We show empirically that removing such redundancy has negligible impact on the model performance both in sample and out of sample. Our modification achieves more efficient parameter usage, as well as a cleaner interpretation of the representation space.

Problem

Research questions and friction points this paper is trying to address.

Rotary Positional Embedding

parameter efficiency

attention mechanism

transformer models

complex linear transformation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Positional Embedding

Complex Linear Transformation

Parameter Efficiency