An Epidemiological Knowledge Graph extracted from the World Health Organization's Disease Outbreak News

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of leveraging unstructured WHO pandemic news texts for real-time epidemiological analysis, this paper proposes a multi-large language model (LLM) fusion framework for automated, fine-grained knowledge extraction—accurately identifying and structuring key epidemiological entities (e.g., pathogen, location, time, case count, transmission mode) and their relations. Built upon this framework, we construct eKG—the first globally scoped, daily-updated dynamic epidemic knowledge graph—accompanied by open data APIs and analytical tools. Our contributions are threefold: (1) the first dynamic knowledge graph explicitly designed for pandemic surveillance; (2) an LLM ensemble verification mechanism that significantly improves entity and relation extraction accuracy; and (3) a fully automated, end-to-end pipeline enabling minute-scale updates from raw reports to the knowledge graph. eKG has already supported multiple public health research initiatives and real-time risk assessments, substantially enhancing outbreak response efficiency.

Technology Category

Application Category

📝 Abstract
The rapid evolution of artificial intelligence (AI), together with the increased availability of social media and news for epidemiological surveillance, are marking a pivotal moment in epidemiology and public health research. Leveraging the power of generative AI, we use an ensemble approach which incorporates multiple Large Language Models (LLMs) to extract valuable actionable epidemiological information from the World Health Organization (WHO) Disease Outbreak News (DONs). DONs is a collection of regular reports on global outbreaks curated by the WHO and the adopted decision-making processes to respond to them. The extracted information is made available in a daily-updated dataset and a knowledge graph, referred to as eKG, derived to provide a nuanced representation of the public health domain knowledge. We provide an overview of this new dataset and describe the structure of eKG, along with the services and tools used to access and utilize the data that we are building on top. These innovative data resources open altogether new opportunities for epidemiological research, and the analysis and surveillance of disease outbreaks.
Problem

Research questions and friction points this paper is trying to address.

Extracting actionable epidemiological information from WHO reports
Building a knowledge graph for public health domain representation
Enabling improved disease outbreak analysis and surveillance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using ensemble LLMs for information extraction
Constructing daily-updated epidemiological knowledge graph
Deriving actionable insights from WHO outbreak reports
Sergio Consoli
Sergio Consoli
European Commission, Joint Research Centre
Data ScienceOperational ResearchArtificial IntelligenceKnowledge EngineeringOptimization
P
Pietro Coletti
European Commission, Joint Research Centre (JRC), Ispra, Italy; Universit`e catholique de Louvain, Institute of Health and Society (IRSS), Brussels, Belgium
P
Peter V. Markov
European Commission, Joint Research Centre (JRC), Ispra, Italy; London School of Hygiene and Tropical Medicine (LSHTM), London, United Kingdom
L
Lia Orfei
European Commission, Joint Research Centre (JRC), Ispra, Italy
Indaco Biazzo
Indaco Biazzo
European Commission, Joint Research Centre (JRC), Ispra, Italy
L
Lea Schuh
European Commission, Joint Research Centre (JRC), Ispra, Italy
Nicolas Stefanovitch
Nicolas Stefanovitch
European Commission
artificial intelligencenatural language processingdata analysismultiagent systemsgraphical models
Lorenzo Bertolini
Lorenzo Bertolini
European Commission, Joint Research Centre (JRC)
Natural Language ProcessingRepresentation LearningAI for Health
Mario Ceresa
Mario Ceresa
Joint Research Center, European Commission
Artificial IntelligenceMachine LearningDeep LearningReinforcement LearningHealth
N
Nikolaos I. Stilianakis
European Commission, Joint Research Centre (JRC), Ispra, Italy