Design and Implementation of a Scalable Clinical Data Warehouse for Resource-Constrained Healthcare Systems

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical bottlenecks in resource-constrained healthcare systems—including lack of patient identifiers, data fragmentation, and poor interoperability—this paper designs and implements NCDW, a scalable, privacy-preserving clinical data warehouse. Methodologically, it introduces a Soundex-driven patient matching mechanism for identity-agnostic record linkage; a lightweight wrapper-based multi-source ETL layer; NoSQL storage (yielding 40–69% faster complex query performance versus SQL); and standardized ICD-11/HL7 FHIR interfaces. It further proposes modular, disease-specific data marts enabling integrated environmental-clinical-demographic analytics. Empirically, NCDW has successfully integrated 1.16 million real-world records, achieving a daily throughput of 19 million records (34 TB over five years). It underpins Bangladesh’s national dengue forecasting and decision-support system and demonstrates rapid adaptability to other infectious disease surveillance, including tuberculosis and COVID-19.

Technology Category

Application Category

📝 Abstract
Centralized electronic health record repositories are critical for advancing disease surveillance, public health research, and evidence-based policymaking. However, developing countries face persistent challenges in achieving this due to fragmented healthcare data sources, inconsistent record-keeping practices, and the absence of standardized patient identifiers, limiting reliable record linkage, compromise data interoperability, and limit scalability-obstacles exacerbated by infrastructural constraints and privacy concerns. To address these barriers, this study proposes a scalable, privacy-preserving clinical data warehouse, NCDW, designed for heterogeneous EHR integration in resource-limited settings and tested with 1.16 million clinical records. The framework incorporates a wrapper-based data acquisition layer for secure, automated ingestion of multisource health data and introduces a soundex algorithm to resolve patient identity mismatches in the absence of unique IDs. A modular data mart is designed for disease-specific analytics, demonstrated through a dengue fever case study in Bangladesh, integrating clinical, demographic, and environmental data for outbreak prediction and resource planning. Quantitative assessment of the data mart underscores its utility in strengthening national decision-support systems, highlighting the model's adaptability for infectious disease management. Comparative evaluation of database technologies reveals NoSQL outperforms relational SQL by 40-69% in complex query processing, while system load estimates validate the architecture's capacity to manage 19 million daily records (34TB over 5 years). The framework can be adapted to various healthcare settings across developing nations by modifying the ingestion layer to accommodate standards like ICD-11 and HL7 FHIR, facilitating interoperability for managing infectious diseases (i.e., COVID, tuberculosis).
Problem

Research questions and friction points this paper is trying to address.

Develop scalable data warehouse for healthcare systems
Integrate heterogeneous EHR in resource-limited settings
Enhance disease surveillance and decision-support systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable clinical data warehouse
Wrapper-based data acquisition
Soundex algorithm identity resolution
🔎 Similar Papers
No similar papers found.
Shovito Barua Soumma
Shovito Barua Soumma
PhD Student, Arizona State University
Deep LearningMobile HealthWearablesEmbedded SystemsSignal Processing
F
Fahim Shahriar
Computer Science Department, University of Minnesota Duluth, USA
U
Umme Niraj Mahi
Computer Science and Engineering Department, Khulna University of Engineering and Technology, Bangladesh
M
Md Hasin Abrar
Computer Science and Engineering Department, Bangladesh University of Engineering and Technology, Bangladesh
M
Md Abdur Rahman Fahad
Computer Science and Engineering Department, Bangladesh University of Engineering and Technology, Bangladesh
A
ASM Latiful Hoque
Computer Science and Engineering Department, Bangladesh University of Engineering and Technology, Bangladesh