CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the limitations of existing large language model–based knowledge graph question answering systems, which often suffer from hallucination and insufficient retrieval coverage due to their neglect of historical query patterns. To overcome these issues, the authors propose a semantic caching–enhanced architecture that transforms a stateless query planner into a continual learning system. The core contributions include three principled cache design strategies: a schema-agnostic natural language interface, diversity-aware retrieval via domain-aspect hierarchical indexing combined with Maximal Marginal Relevance (MMR), and a deterministic subgraph expansion mechanism with complexity guarantees. By integrating intermediate semantic representations, a two-tier cache index, and bounded expansion operators, the proposed system substantially outperforms current approaches—achieving a 13.2% absolute gain in accuracy and a 17.5% improvement in factuality on the CRAG benchmark.

📝 Abstract

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic Representation (ISR) enables non-expert users to interact purely in natural language, while a Backend Adapter grounds the LLM with local schema context to compile executable physical queries safely. (2) Diversity-optimized cache retrieval: A two-layer hierarchical index (Domain $\rightarrow$ Aspect) coupled with Maximal Marginal Relevance (MMR) maximizes structural variety in cached examples, effectively mitigating reasoning homogeneity. (3) Bounded heuristic expansion: Deterministic depth and breadth subgraph operators with strict complexity guarantees significantly enhance retrieval recall without risking unbounded API execution. Extensive experiments on multiple benchmarks demonstrate that CacheRAG significantly outperforms state-of-the-art baselines (e.g., +13.2% accuracy and +17.5% truthfulness on the CRAG dataset).

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

Knowledge Graph Question Answering

Semantic Caching

Schema Hallucination

Query Planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Caching

Retrieval-Augmented Generation

Knowledge Graph Question Answering