Long-Short Alignment for Effective Long-Context Modeling in LLMs

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large language models (LLMs) face a fundamental challenge in long-context modeling: poor length generalization. This paper introduces the novel “long-short alignment” perspective, identifying output distribution consistency—not input encoding design—as the root cause of length generalization failure. To formalize this insight, we propose the Long-Short Misalignment (LSM) metric to quantify distributional divergence between short- and long-context outputs. We then devise a differentiable output-distribution alignment regularization loss and conduct comprehensive output-distribution analysis, synthetic task validation, and alignment-aware training within the Transformer architecture. Under standard long-context extrapolation evaluation protocols, our method significantly improves model generalization on ultra-long sequences and achieves state-of-the-art performance across diverse long-context tasks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have exhibited impressive performance and surprising emergent properties. However, their effectiveness remains limited by the fixed context window of the transformer architecture, posing challenges for long-context modeling. Among these challenges, length generalization -- the ability to generalize to sequences longer than those seen during training -- is a classical and fundamental problem. In this work, we propose a fresh perspective on length generalization, shifting the focus from the conventional emphasis on input features such as positional encodings or data structures to the output distribution of the model. Specifically, through case studies on synthetic tasks, we highlight the critical role of extbf{long-short alignment} -- the consistency of output distributions across sequences of varying lengths. Extending this insight to natural language tasks, we propose a metric called Long-Short Misalignment to quantify this phenomenon, uncovering a strong correlation between the metric and length generalization performance. Building on these findings, we develop a regularization term that promotes long-short alignment during training. Extensive experiments validate the effectiveness of our approach, offering new insights for achieving more effective long-context modeling in LLMs. Code is available at https://github.com/PKU-ML/LongShortAlignment.

Problem

Research questions and friction points this paper is trying to address.

Addresses length generalization in LLMs for long-context modeling

Proposes long-short alignment to improve output distribution consistency

Introduces a regularization term to enhance long-context performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Focuses on output distribution alignment

Introduces Long-Short Misalignment metric

Develops regularization for long-short alignment

🔎 Similar Papers

No similar papers found.