🤖 AI Summary
To address the significant performance degradation of large language models (LLMs) on inputs exceeding their pretrained context window—primarily caused by RoPE extrapolation failure—this paper proposes a training-free, length-aware, multi-granularity positional encoding method. The core innovation lies in (1) a parameterized sigmoid scaling function that dynamically maps input length to position indices, and (2) a multi-granularity attention mechanism that adaptively allocates positional encoding resolution across different token intervals, balancing fine-grained local modeling with long-range dependency capture. Fully compatible with standard RoPE architectures, the method requires no architectural modification or fine-tuning and is plug-and-play. Extensive experiments across three mainstream LLMs and five long-context benchmarks demonstrate substantial improvements over existing extrapolation techniques, notably enhancing long-context comprehension without additional training overhead.
📝 Abstract
Large language models (LLMs) experience significant performance degradation when the input exceeds the pretraining context window, primarily due to the out-of-distribution (OOD) behavior of Rotary Position Embedding (RoPE). Recent studies mitigate this problem by remapping OOD positions into the in-distribution range with fixed mapping strategies, ignoring the dynamic relationship between input length and the model's effective context window. To this end, we propose Length-aware Multi-grained Positional Encoding (LaMPE), a training-free method that fully utilizes the model's effective context window for adaptive long-context scaling in LLMs. Motivated by the left-skewed frequency distribution of relative positions, LaMPE establishes a dynamic relationship between mapping length and input length through a parametric scaled sigmoid function to adaptively allocate positional capacity across varying input lengths. Meanwhile, LaMPE devises a novel multi-grained attention mechanism that strategically allocates positional resolution across different sequence regions to capture both fine-grained locality and long-range dependencies. Our method can be seamlessly applied to a wide range of RoPE-based LLMs without training. Extensive experiments on three representative LLMs across five mainstream long-context benchmarks demonstrate that LaMPE achieves significant performance improvements compared to existing length extrapolation methods. The code will be released at https://github.com/scar-on/LaMPE.