AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses three core challenges in historical biography generation: stylistic inaccuracy, factual inconsistency, and cross-document information fragmentation. To tackle these, we propose a knowledge graph (KG)-guided RAG retrieval framework and an error-type-aware multi-agent hallucination detection-and-correction mechanism; we further design a context-learning-driven dynamic chunking strategy and a two-stage style-customized fine-tuning method. Leveraging KG-based indexing, multi-agent collaboration, data-augmented supervised fine-tuning, and style preference optimization, our model achieves a 3.8× improvement in factual accuracy and a 47.6% reduction in hallucination rate on a real-world epigraphic historical dataset, significantly outperforming existing baselines. The key contribution lies in the deep integration of structured knowledge guidance, fine-grained hallucination mitigation, and historical writing style modeling—establishing a principled approach to domain-adapted, factually grounded, and stylistically authentic biographical generation.

Technology Category

Application Category

📝 Abstract
Huawei has always been committed to exploring the AI application in historical research. Biography generation, as a specialized form of abstractive summarization, plays a crucial role in historical research but faces unique challenges that existing large language models (LLMs) struggle to address. These challenges include maintaining stylistic adherence to historical writing conventions, ensuring factual fidelity, and handling fragmented information across multiple documents. We present AIstorian, a novel end-to-end agentic system featured with a knowledge graph (KG)-powered retrieval-augmented generation (RAG) and anti-hallucination multi-agents. Specifically, AIstorian introduces an in-context learning based chunking strategy and a KG-based index for accurate and efficient reference retrieval. Meanwhile, AIstorian orchestrates multi-agents to conduct on-the-fly hallucination detection and error-type-aware correction. Additionally, to teach LLMs a certain language style, we finetune LLMs based on a two-step training approach combining data augmentation-enhanced supervised fine-tuning with stylistic preference optimization. Extensive experiments on a real-life historical Jinshi dataset demonstrate that AIstorian achieves a 3.8x improvement in factual accuracy and a 47.6% reduction in hallucination rate compared to existing baselines. The data and code are available at: https://github.com/ZJU-DAILY/AIstorian.
Problem

Research questions and friction points this paper is trying to address.

Generates accurate biographies using AI and knowledge graphs.
Addresses challenges in historical writing style and factual fidelity.
Reduces hallucination rates and improves factual accuracy in summaries.
Innovation

Methods, ideas, or system contributions that make the work stand out.

KG-powered retrieval-augmented generation system
Multi-agent hallucination detection and correction
Two-step LLM fine-tuning for style adherence