🤖 AI Summary
To address the high storage overhead and retrieval latency incurred by large-scale knowledge bases in Retrieval-Augmented Generation (RAG) for LLM-based agents, this paper proposes ARC—the first unsupervised, dynamic caching mechanism tailored for agent-level RAG. ARC jointly models the geometric structure of the embedding space and historical query distributions to automatically identify highly relevant passages without supervision, enabling online cache construction, compact maintenance, and adaptive updates. Its core components include a similarity- and frequency-aware cache selection strategy, a lightweight indexing scheme, and a low-latency retrieval design. Experiments across three benchmark datasets demonstrate that ARC compresses cache size to just 0.015% of the original corpus while achieving up to 79.8% answer coverage and reducing average retrieval latency by 80%. This work establishes the first systematic solution to efficient cache management in agent-centric RAG scenarios.
📝 Abstract
Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance, agent-level cache management, particularly constructing, maintaining, and updating a compact, relevant corpus dynamically tailored to each agent's need, remains underexplored. Therefore, we introduce ARC (Agent RAG Cache Mechanism), a novel, annotation-free caching framework that dynamically manages small, high-value corpora for each agent. By synthesizing historical query distribution patterns with the intrinsic geometry of cached items in the embedding space, ARC automatically maintains a high-relevance cache. With comprehensive experiments on three retrieval datasets, our experimental results demonstrate that ARC reduces storage requirements to 0.015% of the original corpus while offering up to 79.8% has-answer rate and reducing average retrieval latency by 80%. Our results demonstrate that ARC can drastically enhance efficiency and effectiveness in RAG-powered LLM agents.