🤖 AI Summary
To address the challenges of semantic misalignment between natural language queries and graph-structured knowledge in knowledge graph–driven retrieval-augmented generation (KG-RAG)—which often triggers hallucinations in large language models (LLMs)—this paper proposes a two-stage alignment framework: first parsing queries into executable graph patterns, then retrieving semantically similar subgraphs via pattern matching. Key contributions include: (1) Graph Semantic Distance (GSD), a novel fine-grained metric for evaluating semantic alignment between patterns and subgraphs; (2) a millisecond-scale approximate subgraph retrieval algorithm achieving average top-k retrieval latency <1 second on KGs with millions of entities; and (3) an LLM-guided, interpretable graph pattern generation mechanism. Experiments demonstrate significant improvements over state-of-the-art KG-RAG baselines on both question answering and fact verification tasks. The implementation is publicly available.
📝 Abstract
Recent advancements in large language models (LLMs) have shown impressive versatility across various tasks. To eliminate their hallucinations, retrieval-augmented generation (RAG) has emerged as a powerful approach, leveraging external knowledge sources like knowledge graphs (KGs). In this paper, we study the task of KG-driven RAG and propose a novel Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG) method. It effectively addresses the challenge of aligning query texts and KG structures through a two-stage process: (1) query-to-pattern, which uses an LLM to transform queries into a desired graph pattern, and (2) pattern-to-subgraph, which quantifies the alignment between the pattern and candidate subgraphs using a graph semantic distance (GSD) metric. We also develop an optimized retrieval algorithm that efficiently identifies the top-k subgraphs within 1-second on a 10-million-scale KG. Extensive experiments show that SimGRAG outperforms state-of-the-art KG-driven RAG methods in both question answering and fact verification. Our code is available at https://github.com/YZ-Cai/SimGRAG.