🤖 AI Summary
Existing extrapolation methods for rotary position embedding (RoPE) lack a unified theoretical foundation, limiting their effectiveness when extending to sequence lengths far beyond those seen during pretraining. This work addresses this gap by introducing MrRoPE, a generalized positional encoding framework grounded in a mixed-radix perspective, which establishes the first unified theoretical framework for RoPE extrapolation. Within this framework, two training-free extension strategies—MrRoPE-Uni and MrRoPE-Pro—are proposed. The approach substantially enhances long-context modeling capabilities, achieving over 85% recall on the 128K-length Needle-in-a-Haystack task and more than doubling the accuracy of YaRN on retrieval and dialogue tasks in InfiniteBench.
📝 Abstract
Rotary Position Embedding (RoPE)-extension refers to modifying or generalizing the Rotary Position Embedding scheme to handle longer sequences than those encountered during pre-training. However, current extension strategies are highly diverse and lack a unified theoretical foundation. In this paper, we propose MrRoPE (Mixed-radix RoPE), a generalized encoding formulation based on a radix system conversion perspective, which elegantly unifies various RoPE-extension approaches as distinct radix conversion strategies. Based on this theory, we introduce two training-free extensions, MrRoPE-Uni and MrRoPE-Pro, which leverage uniform and progressive radix conversion strategies, respectively, to achieve'train short, test long'generalization. Without fine-tuning, MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN's accuracy on Infinite-Bench retrieval and dialogue subsets. Theoretical analysis confirms that MrRoPE-Pro effectively raises the upper bound of RoPE's attainable encoding length, which further validates the reliability and utility of our theory and methodology.