Health Sentinel: An AI Pipeline For Real-time Disease Outbreak Detection

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of real-time monitoring of anomalous health events—such as disease outbreaks—in massive online media streams. We propose a multi-stage automated information extraction framework that integrates rule-based matching, event extraction, deduplication and clustering, and machine learning models to achieve end-to-end conversion of unstructured text into structured epidemic events (including time, location, pathogen, case count, etc.). Our key contribution is a lightweight, interpretable hybrid pipeline robust to high-noise web content, enabling accurate detection and low-false-positive filtering. Deployed since April 2022, the system has processed over 300 million articles, identified 95,000 distinct health events, and validated 3,500+ as potential outbreaks—prompting timely public health interventions. This significantly enhances early-warning timeliness and response efficiency.

Technology Category

Application Category

📝 Abstract
Early detection of disease outbreaks is crucial to ensure timely intervention by the health authorities. Due to the challenges associated with traditional indicator-based surveillance, monitoring informal sources such as online media has become increasingly popular. However, owing to the number of online articles getting published everyday, manual screening of the articles is impractical. To address this, we propose Health Sentinel. It is a multi-stage information extraction pipeline that uses a combination of ML and non-ML methods to extract events-structured information concerning disease outbreaks or other unusual health events-from online articles. The extracted events are made available to the Media Scanning and Verification Cell (MSVC) at the National Centre for Disease Control (NCDC), Delhi for analysis, interpretation and further dissemination to local agencies for timely intervention. From April 2022 till date, Health Sentinel has processed over 300 million news articles and identified over 95,000 unique health events across India of which over 3,500 events were shortlisted by the public health experts at NCDC as potential outbreaks.
Problem

Research questions and friction points this paper is trying to address.

Detect disease outbreaks in real-time from online media
Overcome manual screening limitations with AI pipeline
Extract structured health events for timely intervention
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI pipeline for real-time outbreak detection
Combines ML and non-ML information extraction
Processes millions of online articles automatically
🔎 Similar Papers
No similar papers found.
Devesh Pant
Devesh Pant
Master's Student, Indian Institute of Technology, Delhi
Computer VisionMachine LearningOCR
R
Rishi Raj Grandhe
Work done while working at Wadhwani AI
V
Vipin Samaria
Wadhwani AI, India
M
Mukul Paul
Wadhwani AI, India
S
Sudhir Kumar
Wadhwani AI, India
S
Saransh Khanna
Wadhwani AI, India
J
Jatin Agrawal
Work done while working at Wadhwani AI
Jushaan Singh Kalra
Jushaan Singh Kalra
Carnegie Mellon University
Natural Language Processing
A
Akhil VSSG
Work done while working at Wadhwani AI
S
Satish V Khalikar
Wadhwani AI, India
V
Vipin Garg
Wadhwani AI, India
H
Himanshu Chauhan
National Centre for Disease Control, Government of India
P
Pranay Verma
National Centre for Disease Control, Government of India
N
Neha Khandelwal
Work done while working at Wadhwani AI
S
Soma S Dhavala
Work done while working at Wadhwani AI
M
Minesh Mathew
Wadhwani AI, India