Sentence-Anchored Gist Compression for Long-Context LLMs

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address excessive memory and computational overhead in long-context processing by large language models (LLMs), this paper proposes a sentence-anchored gist compression mechanism. It employs learnable compression tokens to semantically condense context while leveraging sentence-level anchoring for precise alignment, enabling efficient and controllable context reduction. The method significantly outperforms unsupervised compression baselines across 2×–8× compression ratios, maintaining stable performance on both short- and long-context benchmarks. Experiments on a 3B-parameter LLaMA model demonstrate that higher compression ratios incur no substantial performance degradation, striking an effective balance between compression efficiency and task fidelity. This approach establishes a lightweight, scalable paradigm for long-context inference—offering improved resource efficiency without sacrificing semantic integrity or downstream accuracy.

Technology Category

Application Category

📝 Abstract

This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned to compress their context by factors of 2x to 8x without significant performance degradation, as evaluated on both short-context and long-context benchmarks. Furthermore, in experiments on a 3-billion-parameter LLaMA model, our method achieves results on par with alternative compression techniques while attaining higher compression ratios.

Problem

Research questions and friction points this paper is trying to address.

Reducing memory and computational demands for long-sequence LLM processing

Fine-tuning LLMs to compress context by 2x-8x without performance loss

Achieving higher compression ratios than alternative techniques on benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes LLMs with learned compression tokens

Achieves 2x to 8x context compression ratios

Maintains performance comparable to alternative techniques

🔎 Similar Papers

Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models