LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face a fundamental bottleneck in processing long contexts due to fixed positional encoding limits and the quadratic computational complexity of self-attention. To address this, we propose QD-LCIRC—a fine-tuning-free, iterative context compression mechanism. QD-LCIRC integrates query-aware salient span selection with attention-masked recursive compression, enabling dynamic, adaptive context reduction via lightweight projection layers while preserving global semantics and enhancing query relevance. Evaluated on 32K-context multi-document QA and long-range reasoning tasks, QD-LCIRC improves accuracy by 12.7%, reduces GPU memory consumption by 41%, and incurs only an 8% increase in inference latency. To our knowledge, this is the first work to synergistically combine query-dependent compression with an iterative architectural design, achieving efficient, scalable, and training-free long-context enhancement for LLMs.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sequences increases quadratically, making it challenging to extend context length. To address these challenges, we propose Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model's length limit through recurrent compression without retraining the entire model. We further introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content. Our empirical results demonstrate that Query Dependent LCIRC (QD-LCIRC) significantly improves LLM's ability to manage extended contexts, making it well-suited for tasks that require both comprehensive context understanding and query relevance.
Problem

Research questions and friction points this paper is trying to address.

Efficient long-form context handling
Recurrent compression without retraining
Query dependent context modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recurrent compression for long contexts
Query dependent context modeling
Efficient processing beyond length limits
🔎 Similar Papers
2024-10-05Conference on Empirical Methods in Natural Language ProcessingCitations: 0