Addressing accuracy and hallucination of LLMs in Alzheimer's disease research through knowledge graphs

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical limitations of large language models (LLMs) in Alzheimer’s disease (AD) research—including factual inaccuracies, hallucinations, domain knowledge gaps, and untraceable responses—this paper proposes GraphRAG, a biomedical knowledge-intensive framework. GraphRAG integrates a curated, structured AD knowledge graph with retrieval-augmented generation (RAG), constructs a dedicated AD knowledge base and an expert-annotated question-answering benchmark, and leverages GPT-4o to deliver interpretable, evidence-grounded, and fully traceable answers. Experimental results demonstrate that GraphRAG significantly outperforms both standard LLMs and conventional RAG methods in answer accuracy, factual consistency, and evidentiary traceability. Furthermore, we publicly release an evaluation interface to enable standardized, reproducible benchmarking of RAG-based approaches. This work establishes a reproducible methodological paradigm and foundational infrastructure for developing trustworthy, domain-specific AI systems in biomedicine.

Technology Category

Application Category

📝 Abstract
In the past two years, large language model (LLM)-based chatbots, such as ChatGPT, have revolutionized various domains by enabling diverse task completion and question-answering capabilities. However, their application in scientific research remains constrained by challenges such as hallucinations, limited domain-specific knowledge, and lack of explainability or traceability for the response. Graph-based Retrieval-Augmented Generation (GraphRAG) has emerged as a promising approach to improving chatbot reliability by integrating domain-specific contextual information before response generation, addressing some limitations of standard LLMs. Despite its potential, there are only limited studies that evaluate GraphRAG on specific domains that require intensive knowledge, like Alzheimer's disease or other biomedical domains. In this paper, we assess the quality and traceability of two popular GraphRAG systems. We compile a database of 50 papers and 70 expert questions related to Alzheimer's disease, construct a GraphRAG knowledge base, and employ GPT-4o as the LLM for answering queries. We then compare the quality of responses generated by GraphRAG with those from a standard GPT-4o model. Additionally, we discuss and evaluate the traceability of several Retrieval-Augmented Generation (RAG) and GraphRAG systems. Finally, we provide an easy-to-use interface with a pre-built Alzheimer's disease database for researchers to test the performance of both standard RAG and GraphRAG.
Problem

Research questions and friction points this paper is trying to address.

Evaluating GraphRAG accuracy for Alzheimer's disease research
Assessing traceability of GraphRAG systems in biomedical domains
Addressing LLM hallucinations through knowledge graph integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

GraphRAG knowledge base for Alzheimer's disease
GPT-4o model for biomedical query answering
Evaluating traceability in RAG and GraphRAG systems
🔎 Similar Papers
No similar papers found.
T
Tingxuan Xu
Washington University in St. Louis
Jiarui Feng
Jiarui Feng
Washington University in St.Louis
Machine Learning
J
Justin Melendez
Washington University in St. Louis
K
Kaleigh Roberts
Washington University in St. Louis
Donghong Cai
Donghong Cai
Jinan University, Guangzhou
Wireless CommunicationCoding&Information SecuritySignal ProcessingMachine learning
M
Mingfang Zhu
Washington University in St. Louis
D
Donald Elbert
University of Washington
Y
Yixin Chen
Washington University in St. Louis
R
Randall J. Bateman
Washington University in St. Louis