The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel “dimensional inefficiency” phenomenon induced by Rotary Position Embeddings (RoPE) in long-context modeling: the large-scale positional dependence of RoPE’s rotation angles causes significant deactivation of certain attention head dimensions, diminishing their contribution to long-range retrieval and degrading long-context question answering performance. Through position embedding analysis, quantitative dimension utility measurement, statistical characterization of RoPE’s angular spectrum, and controlled multi-model experiments, we establish—for the first time—a strong negative correlation between RoPE’s angular span and dimension utility. Empirical results further demonstrate that pruning these ineffective dimensions improves inference efficiency without compromising model performance. Our findings provide critical insights into the intrinsic limitations of positional encoding schemes and inform the design of more efficient architectures for long-context language modeling.

Technology Category

Application Category

📝 Abstract
The Rotary Position Embedding (RoPE) is widely used in the attention heads of many large language models (LLM). It rotates dimensions in the query and the key vectors by different angles according to their positions in the input sequence. For long context modeling, the range of positions may vary a lot, and thus RoPE rotates some dimensions by a great range of angles. We hypothesize that the wide range of rotation angles may prevent LLMs from utilizing those dimensions. To validate this hypothesis, we present a controlled experiment showing that applying RoPE causes low utility of certain dimensions. Our analyses on three LLMs also indicate that these dimensions do not help LLMs do long-context question answering.
Problem

Research questions and friction points this paper is trying to address.

Rotary Position Embedding inefficiency
Dimension underutilization in LLMs
Long-distance retrieval challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Position Embedding analysis
Dimension inefficiency investigation
Long-distance retrieval optimization
🔎 Similar Papers
No similar papers found.