MrRoPE: Mixed-radix Rotary Position Embedding

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing extrapolation methods for rotary position embedding (RoPE) lack a unified theoretical foundation, limiting their effectiveness when extending to sequence lengths far beyond those seen during pretraining. This work addresses this gap by introducing MrRoPE, a generalized positional encoding framework grounded in a mixed-radix perspective, which establishes the first unified theoretical framework for RoPE extrapolation. Within this framework, two training-free extension strategies—MrRoPE-Uni and MrRoPE-Pro—are proposed. The approach substantially enhances long-context modeling capabilities, achieving over 85% recall on the 128K-length Needle-in-a-Haystack task and more than doubling the accuracy of YaRN on retrieval and dialogue tasks in InfiniteBench.

Technology Category

Application Category

📝 Abstract

Rotary Position Embedding (RoPE)-extension refers to modifying or generalizing the Rotary Position Embedding scheme to handle longer sequences than those encountered during pre-training. However, current extension strategies are highly diverse and lack a unified theoretical foundation. In this paper, we propose MrRoPE (Mixed-radix RoPE), a generalized encoding formulation based on a radix system conversion perspective, which elegantly unifies various RoPE-extension approaches as distinct radix conversion strategies. Based on this theory, we introduce two training-free extensions, MrRoPE-Uni and MrRoPE-Pro, which leverage uniform and progressive radix conversion strategies, respectively, to achieve'train short, test long'generalization. Without fine-tuning, MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN's accuracy on Infinite-Bench retrieval and dialogue subsets. Theoretical analysis confirms that MrRoPE-Pro effectively raises the upper bound of RoPE's attainable encoding length, which further validates the reliability and utility of our theory and methodology.

Problem

Research questions and friction points this paper is trying to address.

Rotary Position Embedding

RoPE extension

long-context modeling

position encoding

sequence length generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Position Embedding

mixed-radix

context extension