Sentence-Anchored Gist Compression for Long-Context LLMs

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive memory and computational overhead in long-context processing by large language models (LLMs), this paper proposes a sentence-anchored gist compression mechanism. It employs learnable compression tokens to semantically condense context while leveraging sentence-level anchoring for precise alignment, enabling efficient and controllable context reduction. The method significantly outperforms unsupervised compression baselines across 2×–8× compression ratios, maintaining stable performance on both short- and long-context benchmarks. Experiments on a 3B-parameter LLaMA model demonstrate that higher compression ratios incur no substantial performance degradation, striking an effective balance between compression efficiency and task fidelity. This approach establishes a lightweight, scalable paradigm for long-context inference—offering improved resource efficiency without sacrificing semantic integrity or downstream accuracy.

Technology Category

Application Category

📝 Abstract
This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned to compress their context by factors of 2x to 8x without significant performance degradation, as evaluated on both short-context and long-context benchmarks. Furthermore, in experiments on a 3-billion-parameter LLaMA model, our method achieves results on par with alternative compression techniques while attaining higher compression ratios.
Problem

Research questions and friction points this paper is trying to address.

Reducing memory and computational demands for long-sequence LLM processing
Fine-tuning LLMs to compress context by 2x-8x without performance loss
Achieving higher compression ratios than alternative techniques on benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes LLMs with learned compression tokens
Achieves 2x to 8x context compression ratios
Maintains performance comparable to alternative techniques
🔎 Similar Papers
No similar papers found.
Dmitrii Tarasov
Dmitrii Tarasov
HSE
E
Elizaveta Goncharova
FusionBrainLab, HSE University
K
Kuznetsov Andrey
FusionBrainLab, Innopolis University