Cache Mechanism for Agent RAG Systems

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the high storage overhead and retrieval latency incurred by large-scale knowledge bases in Retrieval-Augmented Generation (RAG) for LLM-based agents, this paper proposes ARC—the first unsupervised, dynamic caching mechanism tailored for agent-level RAG. ARC jointly models the geometric structure of the embedding space and historical query distributions to automatically identify highly relevant passages without supervision, enabling online cache construction, compact maintenance, and adaptive updates. Its core components include a similarity- and frequency-aware cache selection strategy, a lightweight indexing scheme, and a low-latency retrieval design. Experiments across three benchmark datasets demonstrate that ARC compresses cache size to just 0.015% of the original corpus while achieving up to 79.8% answer coverage and reducing average retrieval latency by 80%. This work establishes the first systematic solution to efficient cache management in agent-centric RAG scenarios.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance, agent-level cache management, particularly constructing, maintaining, and updating a compact, relevant corpus dynamically tailored to each agent's need, remains underexplored. Therefore, we introduce ARC (Agent RAG Cache Mechanism), a novel, annotation-free caching framework that dynamically manages small, high-value corpora for each agent. By synthesizing historical query distribution patterns with the intrinsic geometry of cached items in the embedding space, ARC automatically maintains a high-relevance cache. With comprehensive experiments on three retrieval datasets, our experimental results demonstrate that ARC reduces storage requirements to 0.015% of the original corpus while offering up to 79.8% has-answer rate and reducing average retrieval latency by 80%. Our results demonstrate that ARC can drastically enhance efficiency and effectiveness in RAG-powered LLM agents.

Problem

Research questions and friction points this paper is trying to address.

Managing agent-level cache for RAG systems dynamically

Constructing compact relevant corpora for individual agent needs

Reducing storage requirements while maintaining retrieval performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic cache management for agent-specific corpora

Annotation-free framework using query patterns and embedding geometry

Reduces storage needs while improving retrieval speed and accuracy

🔎 Similar Papers

Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends