Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prevalent “lost-in-the-middle” problem—where token representations at intermediate positions deteriorate in long-context modeling by large language models—this paper proposes a layer-specific RoPE scaling mechanism. It assigns distinct scaling factors to each Transformer layer to mitigate the long-range decay inherent in Rotary Position Embeddings (RoPE). Crucially, we introduce a novel joint optimization framework that parameterizes scaling factors via Bézier curves and employs a genetic algorithm to efficiently search for globally optimal configurations. The method seamlessly integrates with mainstream RoPE extrapolation techniques, including Position Interpolation (PI) and Dynamic-NTK. On the Key-Value Retrieval benchmark, it achieves an average 20% accuracy improvement, substantially enhancing both middle-position representation fidelity and long-context generalization. This work establishes a new paradigm for adaptive, layer-aware RoPE calibration.

Technology Category

Application Category

📝 Abstract
Although large language models (LLMs) have achieved significant progress in handling long-context inputs, they still suffer from the ``lost-in-the-middle'' problem, where crucial information in the middle of the context is often underrepresented or lost. Our extensive experiments reveal that this issue may arise from the rapid long-term decay in Rotary Position Embedding (RoPE). To address this problem, we propose a layer-specific positional encoding scaling method that assigns distinct scaling factors to each layer, slowing down the decay rate caused by RoPE to make the model pay more attention to the middle context. A specially designed genetic algorithm is employed to efficiently select the optimal scaling factors for each layer by incorporating Bezier curves to reduce the search space. Through comprehensive experimentation, we demonstrate that our method significantly alleviates the ``lost-in-the-middle'' problem. Our approach results in an average accuracy improvement of up to 20% on the Key-Value Retrieval dataset. Furthermore, we show that layer-specific interpolation, as opposed to uniform interpolation across all layers, enhances the model's extrapolation capabilities when combined with PI and Dynamic-NTK positional encoding schemes.
Problem

Research questions and friction points this paper is trying to address.

Addresses the 'lost-in-the-middle' problem in LLMs.
Proposes layer-specific scaling for positional encodings.
Improves long-context modeling accuracy by 20%.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-specific scaling of positional encodings
Genetic algorithm for optimal scaling factor selection
Bezier curves to reduce search space complexity
🔎 Similar Papers
No similar papers found.
Zhenghua Wang
Zhenghua Wang
Research Associate (Rice University)
Risk-based design of structures and infrastructure systems under multiple hazards
Yiran Ding
Yiran Ding
HDU
LLMMLSys
C
Changze Lv
Fudan University, Shanghai Key Laboratory of Intelligent Information Processing
Zhibo Xu
Zhibo Xu
Fudan University
large language modelsagent rl
T
Tianlong Li
Fudan University, Shanghai Key Laboratory of Intelligent Information Processing
Tianyuan Shi
Tianyuan Shi
Sun Yat-sen University
NLP
Xiaoqing Zheng
Xiaoqing Zheng
Fudan University
Natural Language Processing and Machine Learning
X
Xuanjing Huang
Fudan University, Shanghai Key Laboratory of Intelligent Information Processing