๐ค AI Summary
Retrieval-Augmented Generation (RAG) faces two key challenges: limited context length and redundant retrieved documentsโwhere naive compression risks losing fine-grained facts, while raw passages lack global semantic coherence. To address this, we propose a dual-level context representation mechanism that jointly encodes natural language snippets (preserving critical factual details) and interpretable semantic compression vectors (capturing holistic document structure). We further introduce dynamic evidence reranking based on these compression vectors, enabling collaborative optimization of local fidelity and knowledge completeness within constrained context budgets. Our method integrates natural language extraction, semantic vector compression, and iterative evidence selection into a unified RAG framework. Evaluated across nine benchmark datasets and five open-source large language models, it achieves improvements of +17.71 in answer relevance, +13.72 in correctness, and +15.53 in semantic similarity over strong baselines.
๐ Abstract
Retrieval-augmented Generation (RAG) extends large language models (LLMs) with external knowledge but faces key challenges: restricted effective context length and redundancy in retrieved documents. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a unified RAG framework that balances local precision and global knowledge coverage under tight context budgets. SARA combines natural-language text snippets with semantic compression vectors to jointly enhance context efficiency and answer correctness. It represents contexts at two complementary levels: 1) fine-grained natural-language spans that preserve critical entities and numerical values, and 2) compact, interpretable vectors that summarize high-level semantics. An iterative evidence-selection module employs the compression vectors for dynamic reranking of contexts. Across 9 datasets and 5 open-source LLMs spanning 3 model families (Mistral, Llama, and Gemma), SARA consistently improves answer relevance (+17.71), answer correctness (+13.72), and semantic similarity (+15.53), demonstrating the importance of integrating textual and compressed representations for robust, context-efficient RAG.