🤖 AI Summary
This work addresses the limitations of existing large language model–based knowledge graph question answering systems, which often suffer from hallucination and insufficient retrieval coverage due to their neglect of historical query patterns. To overcome these issues, the authors propose a semantic caching–enhanced architecture that transforms a stateless query planner into a continual learning system. The core contributions include three principled cache design strategies: a schema-agnostic natural language interface, diversity-aware retrieval via domain-aspect hierarchical indexing combined with Maximal Marginal Relevance (MMR), and a deterministic subgraph expansion mechanism with complexity guarantees. By integrating intermediate semantic representations, a two-tier cache index, and bounded expansion operators, the proposed system substantially outperforms current approaches—achieving a 13.2% absolute gain in accuracy and a 17.5% improvement in factuality on the CRAG benchmark.
📝 Abstract
The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic Representation (ISR) enables non-expert users to interact purely in natural language, while a Backend Adapter grounds the LLM with local schema context to compile executable physical queries safely. (2) Diversity-optimized cache retrieval: A two-layer hierarchical index (Domain $\rightarrow$ Aspect) coupled with Maximal Marginal Relevance (MMR) maximizes structural variety in cached examples, effectively mitigating reasoning homogeneity. (3) Bounded heuristic expansion: Deterministic depth and breadth subgraph operators with strict complexity guarantees significantly enhance retrieval recall without risking unbounded API execution. Extensive experiments on multiple benchmarks demonstrate that CacheRAG significantly outperforms state-of-the-art baselines (e.g., +13.2% accuracy and +17.5% truthfulness on the CRAG dataset).