Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address factual unreliability caused by hallucinations in large language models (LLMs) within high-stakes domains such as cyber offense and defense, this paper proposes LinkQ—a novel system that, for the first time, enforces real-time knowledge graph (KG) querying as a hard constraint on LLM inference, thereby tightly coupling generation with structured knowledge. Our method integrates KG embeddings, prompt engineering, multi-hop KG question answering (KGQA) query generation, and answer–query alignment to ensure verifiability. Experiments demonstrate state-of-the-art performance on standard KGQA benchmarks, surpassing GPT-4; cybersecurity expert evaluations further confirm significantly improved factual accuracy. The primary contributions are: (1) a KG-driven paradigm for hallucination mitigation, (2) empirical validation of its efficacy in high-risk operational settings, and (3) identification of bottlenecks in constructing complex, multi-hop KG queries under realistic reasoning demands.

Technology Category

Application Category

📝 Abstract

High-stakes domains like cyber operations need responsible and trustworthy AI methods. While large language models (LLMs) are becoming increasingly popular in these domains, they still suffer from hallucinations. This research paper provides learning outcomes from a case study with LinkQ, an open-source natural language interface that was developed to combat hallucinations by forcing an LLM to query a knowledge graph (KG) for ground-truth data during question-answering (QA). We conduct a quantitative evaluation of LinkQ using a well-known KGQA dataset, showing that the system outperforms GPT-4 but still struggles with certain question categories - suggesting that alternative query construction strategies will need to be investigated in future LLM querying systems. We discuss a qualitative study of LinkQ with two domain experts using a real-world cybersecurity KG, outlining these experts' feedback, suggestions, perceived limitations, and future opportunities for systems like LinkQ.

Problem

Research questions and friction points this paper is trying to address.

Mitigate LLM hallucinations using knowledge graphs

Improve QA accuracy in high-stakes domains

Evaluate KG-based querying for trustworthy AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses knowledge graphs to combat LLM hallucinations

Forces LLM to query KG for ground-truth data

Evaluates performance with KGQA dataset

🔎 Similar Papers

No similar papers found.