MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing medical knowledge graphs (KGs) struggle to model temporal dynamics and contextual uncertainty. To address this, we propose a temporal-evolving medical KG construction method. Our approach employs a dual-agent collaborative framework powered by the Qwen2.5-32B-Instruct large language model: an extraction agent performs entity-relation extraction and estimates confidence scores via sampling-based generation; a fusion agent dynamically updates the KG and resolves conflicts by integrating timestamps and confidence scores. This work pioneers fine-grained, day-level temporal modeling of medical knowledge evolution. Leveraging over 10 million PubMed abstracts in an incremental manner, we construct a KG comprising 156,000 entities and 2.97 million triples, achieving 89.7% accuracy. Evaluated on seven medical question-answering benchmarks, our KG significantly enhances retrieval-augmented generation (RAG) performance. The method establishes a novel paradigm for dynamic knowledge representation and reasoning in biomedicine.

Technology Category

Application Category

📝 Abstract

The rapid expansion of medical literature presents growing challenges for structuring and integrating domain knowledge at scale. Knowledge Graphs (KGs) offer a promising solution by enabling efficient retrieval, automated reasoning, and knowledge discovery. However, current KG construction methods often rely on supervised pipelines with limited generalizability or naively aggregate outputs from Large Language Models (LLMs), treating biomedical corpora as static and ignoring the temporal dynamics and contextual uncertainty of evolving knowledge. To address these limitations, we introduce MedKGent, a LLM agent framework for constructing temporally evolving medical KGs. Leveraging over 10 million PubMed abstracts published between 1975 and 2023, we simulate the emergence of biomedical knowledge via a fine-grained daily time series. MedKGent incrementally builds the KG in a day-by-day manner using two specialized agents powered by the Qwen2.5-32B-Instruct model. The Extractor Agent identifies knowledge triples and assigns confidence scores via sampling-based estimation, which are used to filter low-confidence extractions and inform downstream processing. The Constructor Agent incrementally integrates the retained triples into a temporally evolving graph, guided by confidence scores and timestamps to reinforce recurring knowledge and resolve conflicts. The resulting KG contains 156,275 entities and 2,971,384 relational triples. Quality assessments by two SOTA LLMs and three domain experts demonstrate an accuracy approaching 90%, with strong inter-rater agreement. To evaluate downstream utility, we conduct RAG across seven medical question answering benchmarks using five leading LLMs, consistently observing significant improvements over non-augmented baselines. Case studies further demonstrate the KG's value in literature-based drug repurposing via confidence-aware causal inference.

Problem

Research questions and friction points this paper is trying to address.

Constructing temporally evolving medical knowledge graphs from literature

Addressing limited generalizability in current KG construction methods

Handling temporal dynamics and uncertainty in biomedical knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agent framework for evolving medical KGs

Daily time series simulation of PubMed data

Confidence-guided triple extraction and integration

🔎 Similar Papers

A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises