Simple Context Compression: Mean-Pooling and Multi-Ratio Training

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

To address the high computational overhead induced by long contexts in retrieval-augmented generation (RAG), this paper proposes a lightweight soft context compression method. Instead of employing complex, learnable compression modules, it adopts mean pooling to efficiently map input sequences into continuous, dense low-dimensional representations. The approach supports joint training across multiple compression ratios, enabling flexible adaptation to varying context lengths and large language model (LLM) scales. Crucially, it introduces no additional parameters, significantly reducing both computational and memory costs. Extensive evaluation on multiple open-domain question answering benchmarks—across diverse LLM sizes—demonstrates its effectiveness: input sequence length is reduced by up to 8×, with only marginal performance degradation (average drop <1.5%), strong generalization, and seamless deployability.

Technology Category

Application Category

📝 Abstract

A common strategy to reduce the computational costs of using long contexts in retrieval-augmented generation (RAG) with large language models (LLMs) is soft context compression, where the input sequence is transformed into a shorter continuous representation. We develop a lightweight and simple mean-pooling approach that consistently outperforms the widely used compression-tokens architecture, and study training the same compressor to output multiple compression ratios. We conduct extensive experiments across in-domain and out-of-domain QA datasets, as well as across model families, scales, and compression ratios. Overall, our simple mean-pooling approach achieves the strongest performance, with a relatively small drop when training for multiple compression ratios. More broadly though, across architectures and training regimes the trade-offs are more nuanced, illustrating the complex landscape of compression methods.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of long contexts in retrieval-augmented generation

Developing lightweight mean-pooling approach for soft context compression

Training single compressor to handle multiple compression ratios effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mean-pooling compression outperforms token-based architectures

Single compressor trained for multiple compression ratios

Lightweight approach maintains performance across varied datasets

🔎 Similar Papers

No similar papers found.