RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

📅 2025-11-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RAG systems suffer from degraded performance during long-context processing and multi-turn interactions due to high prefill overhead; existing caching strategies fail to simultaneously achieve high reuse rates and inference accuracy. This paper introduces RAGBoost, the first context-reuse framework that preserves generation accuracy without loss. It achieves this through three core techniques: semantic deduplication of retrieved contexts, dynamic index reordering based on relevance and recency, and lightweight prompt injection to preserve contextual fidelity. RAGBoost is fully compatible with mainstream LLM inference engines and requires no model architecture modifications. Experiments across diverse RAG and agent-based tasks demonstrate that RAGBoost accelerates prefill by 1.5–3× while maintaining or improving generation accuracy—effectively alleviating the efficiency bottleneck in RAG systems under long-context and high-frequency interaction scenarios.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with retrieved context but often suffers from downgraded prefill performance as modern applications demand longer and more complex inputs. Existing caching techniques either preserve accuracy with low cache reuse or improve reuse at the cost of degraded reasoning quality. We present RAGBoost, an efficient RAG system that achieves high cache reuse without sacrificing accuracy through accuracy-preserving context reuse. RAGBoost detects overlapping retrieved items across concurrent sessions and multi-turn interactions, using efficient context indexing, ordering, and de-duplication to maximize reuse, while lightweight contextual hints maintain reasoning fidelity. It integrates seamlessly with existing LLM inference engines and improves their prefill performance by 1.5-3X over state-of-the-art methods, while preserving or even enhancing reasoning accuracy across diverse RAG and agentic AI workloads. Our code is released at: https://github.com/Edinburgh-AgenticAI/RAGBoost.
Problem

Research questions and friction points this paper is trying to address.

Improves RAG efficiency by reusing context without accuracy loss
Addresses low cache reuse and degraded reasoning in existing systems
Enhances prefill performance while maintaining reasoning fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects overlapping retrieved items across sessions
Uses context indexing and de-duplication for reuse
Maintains reasoning fidelity with lightweight contextual hints
🔎 Similar Papers
No similar papers found.
Y
Yinsicheng Jiang
University of Edinburgh, United Kingdom
Yeqi Huang
Yeqi Huang
University of Edinburgh
ServerlessAI
L
Liang Cheng
University of Edinburgh, United Kingdom
Cheng Deng
Cheng Deng
University of Edinburgh
On-device LLMNLPGeoAI
X
Xuan Sun
University of Edinburgh, United Kingdom
Luo Mai
Luo Mai
Associate Professor at University of Edinburgh
Computer SystemsMachine LearningData Management