🤖 AI Summary
To address the challenge of leveraging unstructured WHO pandemic news texts for real-time epidemiological analysis, this paper proposes a multi-large language model (LLM) fusion framework for automated, fine-grained knowledge extraction—accurately identifying and structuring key epidemiological entities (e.g., pathogen, location, time, case count, transmission mode) and their relations. Built upon this framework, we construct eKG—the first globally scoped, daily-updated dynamic epidemic knowledge graph—accompanied by open data APIs and analytical tools. Our contributions are threefold: (1) the first dynamic knowledge graph explicitly designed for pandemic surveillance; (2) an LLM ensemble verification mechanism that significantly improves entity and relation extraction accuracy; and (3) a fully automated, end-to-end pipeline enabling minute-scale updates from raw reports to the knowledge graph. eKG has already supported multiple public health research initiatives and real-time risk assessments, substantially enhancing outbreak response efficiency.
📝 Abstract
The rapid evolution of artificial intelligence (AI), together with the increased availability of social media and news for epidemiological surveillance, are marking a pivotal moment in epidemiology and public health research. Leveraging the power of generative AI, we use an ensemble approach which incorporates multiple Large Language Models (LLMs) to extract valuable actionable epidemiological information from the World Health Organization (WHO) Disease Outbreak News (DONs). DONs is a collection of regular reports on global outbreaks curated by the WHO and the adopted decision-making processes to respond to them. The extracted information is made available in a daily-updated dataset and a knowledge graph, referred to as eKG, derived to provide a nuanced representation of the public health domain knowledge. We provide an overview of this new dataset and describe the structure of eKG, along with the services and tools used to access and utilize the data that we are building on top. These innovative data resources open altogether new opportunities for epidemiological research, and the analysis and surveillance of disease outbreaks.