MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph

πŸ“… 2025-08-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing medical knowledge graphs (KGs) struggle to model temporal dynamics and contextual uncertainty. To address this, we propose a temporal-evolving medical KG construction method. Our approach employs a dual-agent collaborative framework powered by the Qwen2.5-32B-Instruct large language model: an extraction agent performs entity-relation extraction and estimates confidence scores via sampling-based generation; a fusion agent dynamically updates the KG and resolves conflicts by integrating timestamps and confidence scores. This work pioneers fine-grained, day-level temporal modeling of medical knowledge evolution. Leveraging over 10 million PubMed abstracts in an incremental manner, we construct a KG comprising 156,000 entities and 2.97 million triples, achieving 89.7% accuracy. Evaluated on seven medical question-answering benchmarks, our KG significantly enhances retrieval-augmented generation (RAG) performance. The method establishes a novel paradigm for dynamic knowledge representation and reasoning in biomedicine.

Technology Category

Application Category

πŸ“ Abstract
The rapid expansion of medical literature presents growing challenges for structuring and integrating domain knowledge at scale. Knowledge Graphs (KGs) offer a promising solution by enabling efficient retrieval, automated reasoning, and knowledge discovery. However, current KG construction methods often rely on supervised pipelines with limited generalizability or naively aggregate outputs from Large Language Models (LLMs), treating biomedical corpora as static and ignoring the temporal dynamics and contextual uncertainty of evolving knowledge. To address these limitations, we introduce MedKGent, a LLM agent framework for constructing temporally evolving medical KGs. Leveraging over 10 million PubMed abstracts published between 1975 and 2023, we simulate the emergence of biomedical knowledge via a fine-grained daily time series. MedKGent incrementally builds the KG in a day-by-day manner using two specialized agents powered by the Qwen2.5-32B-Instruct model. The Extractor Agent identifies knowledge triples and assigns confidence scores via sampling-based estimation, which are used to filter low-confidence extractions and inform downstream processing. The Constructor Agent incrementally integrates the retained triples into a temporally evolving graph, guided by confidence scores and timestamps to reinforce recurring knowledge and resolve conflicts. The resulting KG contains 156,275 entities and 2,971,384 relational triples. Quality assessments by two SOTA LLMs and three domain experts demonstrate an accuracy approaching 90%, with strong inter-rater agreement. To evaluate downstream utility, we conduct RAG across seven medical question answering benchmarks using five leading LLMs, consistently observing significant improvements over non-augmented baselines. Case studies further demonstrate the KG's value in literature-based drug repurposing via confidence-aware causal inference.
Problem

Research questions and friction points this paper is trying to address.

Constructing temporally evolving medical knowledge graphs from literature
Addressing limited generalizability in current KG construction methods
Handling temporal dynamics and uncertainty in biomedical knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agent framework for evolving medical KGs
Daily time series simulation of PubMed data
Confidence-guided triple extraction and integration
πŸ”Ž Similar Papers
No similar papers found.
Duzhen Zhang
Duzhen Zhang
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingMultimodalLarge Language ModelsContinual LearningAI4Science
Zixiao Wang
Zixiao Wang
University of Science and Technology of China
Z
Zhong-Zhi Li
University of Chinese Academy of Sciences, Beijing, China
Yahan Yu
Yahan Yu
Kyoto University
Multimodal LLMContinual Learning
S
Shuncheng Jia
University of Chinese Academy of Sciences, Beijing, China
J
Jiahua Dong
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
H
Haotian Xu
Tsinghua University, Beijing, China
X
Xing Wu
University of Chinese Academy of Sciences, Beijing, China
Y
Yingying Zhang
East China Normal University, Shanghai, China
Tielin Zhang
Tielin Zhang
Chinese Academy of Sciences
Spiking Neural NetworksCognitive ComputationComputational Neuroscience
J
Jie Yang
Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Xiuying Chen
Xiuying Chen
MBZUAI
Trustworthy NLPHuman-Centered NLPComputational Social Science
Le Song
Le Song
CTO, GenBio AI; Professor, MBZUAI
AIAI for ScienceMachine Learning