Research on Graph-Retrieval Augmented Generation Based on Historical Text Knowledge Graphs

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
General-purpose large language models (LLMs) suffer from domain-knowledge gaps, severe hallucination, and poor interpretability when analyzing historical texts. Method: We propose GraphRAG—a knowledge-graph-driven retrieval-augmented generation framework integrating chain-of-thought prompting, self-instruction generation, and process supervision—to enable high-quality, low-annotation person-relation extraction. Contribution/Results: We construct the first low-annotation person-relation dataset for the “First Four Histories.” Fine-tuning Xunzi-Qwen1.5-14B on this data achieves F1 = 0.68. Coupling DeepSeek with GraphRAG boosts F1 on C-CLUE by 11 percentage points (0.08 → 0.19), outperforming baselines by +0.12. The approach significantly mitigates hallucination and enhances reasoning interpretability, establishing a novel paradigm for knowledge services over low-resource classical texts.

Technology Category

Application Category

📝 Abstract
This article addresses domain knowledge gaps in general large language models for historical text analysis in the context of computational humanities and AIGC technology. We propose the Graph RAG framework, combining chain-of-thought prompting, self-instruction generation, and process supervision to create a The First Four Histories character relationship dataset with minimal manual annotation. This dataset supports automated historical knowledge extraction, reducing labor costs. In the graph-augmented generation phase, we introduce a collaborative mechanism between knowledge graphs and retrieval-augmented generation, improving the alignment of general models with historical knowledge. Experiments show that the domain-specific model Xunzi-Qwen1.5-14B, with Simplified Chinese input and chain-of-thought prompting, achieves optimal performance in relation extraction (F1 = 0.68). The DeepSeek model integrated with GraphRAG improves F1 by 11% (0.08-0.19) on the open-domain C-CLUE relation extraction dataset, surpassing the F1 value of Xunzi-Qwen1.5-14B (0.12), effectively alleviating hallucinations phenomenon, and improving interpretability. This framework offers a low-resource solution for classical text knowledge extraction, advancing historical knowledge services and humanities research.
Problem

Research questions and friction points this paper is trying to address.

Addressing knowledge gaps in general large language models for historical text analysis.
Automating historical knowledge extraction with minimal manual annotation.
Improving alignment of general models with historical knowledge using Graph RAG.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph RAG combines chain-of-thought and self-instruction
Knowledge graphs enhance retrieval-augmented generation alignment
Low-resource solution for historical text extraction
🔎 Similar Papers
No similar papers found.
Yang Fan
Yang Fan
University of Science and Technology of China
Learning to TeachAutomated Machine LearningNeural Architecture SearchNatural Language ProcessingAI for Medicine
Zhang Qi
Zhang Qi
School of Economics and Management, Shanxi University, Taiyuan 030006, People’s Republic of China
Wenqian Xing
Wenqian Xing
Stanford University
Operations Research
L
Liu Chang
College of Information Management, Nanjing Agricultural University, Nanjing 210095, People’s Republic of China
L
Liu Liu
College of Information Management, Nanjing Agricultural University, Nanjing 210095, People’s Republic of China