Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Standard RoPE computes attention scores using only the real part of complex dot products, discarding the imaginary part—which encodes critical phase information—thereby weakening modeling of long-range positional dependencies. To address this, we propose Full-Complex RoPE, the first RoPE variant that fully incorporates the previously neglected imaginary component into the attention mechanism. It introduces a dual-branch attention scoring function operating jointly on real and imaginary parts and employs phase-preserving full-complex dot products. Theoretical analysis demonstrates that our method significantly enhances expressivity for positional relationships in long sequences and supports arbitrary context lengths without interpolation or extrapolation. Empirical evaluation across multiple long-context language modeling benchmarks shows consistent improvements over standard RoPE, with performance gains scaling with context length—validating its effectiveness, scalability, and theoretical superiority.

Technology Category

Application Category

📝 Abstract

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-context dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases. The code is available at https://github.com/OpenMOSS/rope_pp.

Problem

Research questions and friction points this paper is trying to address.

Incorporates imaginary component for better attention

Enhances long-context dependency modeling in LLMs

Improves performance as context length increases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates imaginary component of complex-valued dot product

Uses dual-component attention score for enhanced modeling

Improves long-context dependency performance over standard RoPE

🔎 Similar Papers

LieRE: Generalizing Rotary Position Encodings