SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Retrieval-Augmented Generation (RAG) faces two key challenges: limited context length and redundant retrieved documents—where naive compression risks losing fine-grained facts, while raw passages lack global semantic coherence. To address this, we propose a dual-level context representation mechanism that jointly encodes natural language snippets (preserving critical factual details) and interpretable semantic compression vectors (capturing holistic document structure). We further introduce dynamic evidence reranking based on these compression vectors, enabling collaborative optimization of local fidelity and knowledge completeness within constrained context budgets. Our method integrates natural language extraction, semantic vector compression, and iterative evidence selection into a unified RAG framework. Evaluated across nine benchmark datasets and five open-source large language models, it achieves improvements of +17.71 in answer relevance, +13.72 in correctness, and +15.53 in semantic similarity over strong baselines.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented Generation (RAG) extends large language models (LLMs) with external knowledge but faces key challenges: restricted effective context length and redundancy in retrieved documents. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a unified RAG framework that balances local precision and global knowledge coverage under tight context budgets. SARA combines natural-language text snippets with semantic compression vectors to jointly enhance context efficiency and answer correctness. It represents contexts at two complementary levels: 1) fine-grained natural-language spans that preserve critical entities and numerical values, and 2) compact, interpretable vectors that summarize high-level semantics. An iterative evidence-selection module employs the compression vectors for dynamic reranking of contexts. Across 9 datasets and 5 open-source LLMs spanning 3 model families (Mistral, Llama, and Gemma), SARA consistently improves answer relevance (+17.71), answer correctness (+13.72), and semantic similarity (+15.53), demonstrating the importance of integrating textual and compressed representations for robust, context-efficient RAG.

Problem

Research questions and friction points this paper is trying to address.

Balances local precision and global knowledge coverage in RAG

Reduces redundancy while preserving critical details in retrieved documents

Improves answer relevance and correctness with compressed semantic vectors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines text snippets with semantic compression vectors

Represents contexts at fine-grained and summary levels

Uses dynamic reranking for adaptive context selection

🔎 Similar Papers

No similar papers found.