CRoPE: Efficient Parametrization of Rotary Positional Embedding

πŸ“… 2026-01-06
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the parameter redundancy in conventional Rotary Position Embedding (RoPE) within the query, key, and value projections, which fails to fully exploit the expressive efficiency of complex linear transformations. The authors propose a novel RoPE reformulation grounded in genuine complex linear mappings, seamlessly integrating positional encoding into linear transformations in the complex domain. This approach substantially reduces the number of parameters in the attention module by nearly 50% while preserving model performance almost unchanged, thereby enhancing representational conciseness and interpretability. Empirical evaluations demonstrate that the parameter reduction incurs negligible performance degradation on both in-distribution and out-of-distribution tasks, confirming the method’s superiority in parameter efficiency.

Technology Category

Application Category

πŸ“ Abstract
Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear algebra, we note that the actual implementation of $Q/K/V$-projections is not equivalent to a complex linear transformation. We argue that complex linear transformation is a more natural parametrization and saves near 50\% parameters within the attention block. We show empirically that removing such redundancy has negligible impact on the model performance both in sample and out of sample. Our modification achieves more efficient parameter usage, as well as a cleaner interpretation of the representation space.
Problem

Research questions and friction points this paper is trying to address.

Rotary Positional Embedding
parameter efficiency
attention mechanism
transformer models
complex linear transformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Positional Embedding
Complex Linear Transformation
Parameter Efficiency
Transformer Architecture
CRoPE
πŸ”Ž Similar Papers
No similar papers found.