SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation

📅 2024-12-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

To address the challenges of semantic misalignment between natural language queries and graph-structured knowledge in knowledge graph–driven retrieval-augmented generation (KG-RAG)—which often triggers hallucinations in large language models (LLMs)—this paper proposes a two-stage alignment framework: first parsing queries into executable graph patterns, then retrieving semantically similar subgraphs via pattern matching. Key contributions include: (1) Graph Semantic Distance (GSD), a novel fine-grained metric for evaluating semantic alignment between patterns and subgraphs; (2) a millisecond-scale approximate subgraph retrieval algorithm achieving average top-k retrieval latency <1 second on KGs with millions of entities; and (3) an LLM-guided, interpretable graph pattern generation mechanism. Experiments demonstrate significant improvements over state-of-the-art KG-RAG baselines on both question answering and fact verification tasks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Recent advancements in large language models (LLMs) have shown impressive versatility across various tasks. To eliminate their hallucinations, retrieval-augmented generation (RAG) has emerged as a powerful approach, leveraging external knowledge sources like knowledge graphs (KGs). In this paper, we study the task of KG-driven RAG and propose a novel Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG) method. It effectively addresses the challenge of aligning query texts and KG structures through a two-stage process: (1) query-to-pattern, which uses an LLM to transform queries into a desired graph pattern, and (2) pattern-to-subgraph, which quantifies the alignment between the pattern and candidate subgraphs using a graph semantic distance (GSD) metric. We also develop an optimized retrieval algorithm that efficiently identifies the top-k subgraphs within 1-second on a 10-million-scale KG. Extensive experiments show that SimGRAG outperforms state-of-the-art KG-driven RAG methods in both question answering and fact verification. Our code is available at https://github.com/YZ-Cai/SimGRAG.

Problem

Research questions and friction points this paper is trying to address.

Aligning query texts with KG structures effectively

Improving retrieval speed for large-scale knowledge graphs

Enhancing accuracy in question answering and fact verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM to transform queries into graph patterns

Quantifies alignment with Graph Semantic Distance metric

Optimized retrieval for 10-million-scale KGs

🔎 Similar Papers

GRAG: Graph Retrieval-Augmented Generation