🤖 AI Summary
Existing context compression methods—such as gist or memory tokens—often overlook the local inductive bias introduced by positional encoding, thereby impairing global dependency modeling. This work is the first to identify the critical impact of position ID design on compression performance and proposes a lightweight, layout-enhancement method that solely reorders position IDs—requiring no architectural modifications or changes to training objectives. By remapping positional encodings, the method explicitly models long-range dependencies between compressed tokens and the full context, synergistically integrating with gist/memory token architectures. Empirically, it achieves an average ROUGE-1 F1 gain of 1.9 on cross-domain question answering and improves accuracy by 2.6% on vision-augmented LLMs for visual context compression. This work establishes a new paradigm for efficient, non-intrusive context compression.
📝 Abstract
Using special tokens (e.g., gist, memory, or compressed tokens) to compress context information is a common practice for large language models (LLMs). However, existing approaches often neglect that position encodings inherently induce local inductive biases in models, causing the compression process to ignore holistic contextual dependencies. We propose Enhanced Position Layout (EPL), a simple yet effective method that improves the context compression capability of LLMs by only adjusting position IDs, the numerical identifiers that specify token positions. EPL minimizes the distance between context tokens and their corresponding special tokens and at the same time maintains the sequence order in position IDs between context tokens, special tokens, and the subsequent tokens. Integrating EPL into our best performing context compression model results in 1.9 ROUGE-1 F1 improvement on out-of-domain question answering datasets in average. When extended to multimodal scenarios, EPL brings an average accuracy gain of 2.6 to vision compression LLMs.