RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

RAG systems suffer from degraded performance during long-context processing and multi-turn interactions due to high prefill overhead; existing caching strategies fail to simultaneously achieve high reuse rates and inference accuracy. This paper introduces RAGBoost, the first context-reuse framework that preserves generation accuracy without loss. It achieves this through three core techniques: semantic deduplication of retrieved contexts, dynamic index reordering based on relevance and recency, and lightweight prompt injection to preserve contextual fidelity. RAGBoost is fully compatible with mainstream LLM inference engines and requires no model architecture modifications. Experiments across diverse RAG and agent-based tasks demonstrate that RAGBoost accelerates prefill by 1.5–3× while maintaining or improving generation accuracy—effectively alleviating the efficiency bottleneck in RAG systems under long-context and high-frequency interaction scenarios.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) with retrieved context but often suffers from downgraded prefill performance as modern applications demand longer and more complex inputs. Existing caching techniques either preserve accuracy with low cache reuse or improve reuse at the cost of degraded reasoning quality. We present RAGBoost, an efficient RAG system that achieves high cache reuse without sacrificing accuracy through accuracy-preserving context reuse. RAGBoost detects overlapping retrieved items across concurrent sessions and multi-turn interactions, using efficient context indexing, ordering, and de-duplication to maximize reuse, while lightweight contextual hints maintain reasoning fidelity. It integrates seamlessly with existing LLM inference engines and improves their prefill performance by 1.5-3X over state-of-the-art methods, while preserving or even enhancing reasoning accuracy across diverse RAG and agentic AI workloads. Our code is released at: https://github.com/Edinburgh-AgenticAI/RAGBoost.

Problem

Research questions and friction points this paper is trying to address.

Improves RAG efficiency by reusing context without accuracy loss

Addresses low cache reuse and degraded reasoning in existing systems

Enhances prefill performance while maintaining reasoning fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects overlapping retrieved items across sessions

Uses context indexing and de-duplication for reuse

Maintains reasoning fidelity with lightweight contextual hints

🔎 Similar Papers

No similar papers found.