Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

139K/year

🤖 AI Summary

Existing knowledge graph-enhanced large language models (KG-LLMs) suffer from a lack of high-quality question-answering datasets with ground-truth subgraph annotations, hindering rigorous evaluation and comparative analysis of KG retrievers. To address this, we propose SynthKGQA—a novel framework that automatically synthesizes the GTSQA dataset from real-world knowledge graphs (e.g., Wikidata), featuring questions paired with complete, semantically rich reasoning-path subgraphs. Our contributions are threefold: (1) a generalizable subgraph sampling and question-mapping mechanism ensuring both structural authenticity and semantic diversity; (2) comprehensive ground-truth subgraph annotations, enabling strict zero-shot evaluation of KG retrievers on unseen graph structures and relation types; and (3) substantial improvements in both training efficiency and evaluation fidelity—validated across mainstream KG-LLM methods, where SynthKGQA yields superior zero-shot transfer performance.

Technology Category

Application Category

📝 Abstract

Retrieval of information from graph-structured knowledge bases represents a promising direction for improving the factuality of LLMs. While various solutions have been proposed, a comparison of methods is difficult due to the lack of challenging QA datasets with ground-truth targets for graph retrieval. We present SynthKGQA, a framework for generating high-quality synthetic Knowledge Graph Question Answering datasets from any Knowledge Graph, providing the full set of ground-truth facts in the KG to reason over each question. We show how, in addition to enabling more informative benchmarking of KG retrievers, the data produced with SynthKGQA also allows us to train better models. We apply SynthKGQA to Wikidata to generate GTSQA, a new dataset designed to test zero-shot generalization abilities of KG retrievers with respect to unseen graph structures and relation types, and benchmark popular solutions for KG-augmented LLMs on it.

Problem

Research questions and friction points this paper is trying to address.

Lack of challenging QA datasets with ground-truth graph retrieval targets

Difficulty in comparing methods for knowledge graph augmented LLMs

Need for better training and evaluation of KG retrieval systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generating synthetic KGQA datasets from any Knowledge Graph

Providing ground-truth facts for reasoning over questions

Enabling better training and benchmarking of KG retrievers

🔎 Similar Papers

No similar papers found.