Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study evaluates the reliability and applicability of large language models (LLMs) in disseminating historical public health crisis knowledge in resource-constrained settings. Focusing on COVID-19, dengue, Nipah virus, and chikungunya, the authors construct a multilingual question-answering dataset and introduce a novel hybrid evaluation framework that integrates semantic similarity, natural language inference, and expert–model cross-validation. The framework is systematically applied to assess mainstream LLMs—including GPT-4, Gemini Pro, Llama-3, and Mistral-7B—in low-resource contexts such as Bangladesh. The findings reveal critical limitations and potential risks in the models’ ability to accurately convey complex epidemiological knowledge, offering empirical evidence and cautionary insights for deploying LLMs to support public health decision-making in underserved regions.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) offer significant potential for delivering health information. However, their reliability in low-resource contexts remains uncertain. This study evaluates GPT-4, Gemini Pro, Llama~3, and Mistral-7B on health crisis-related enquiries concerning COVID-19, dengue, the Nipah virus, and Chikungunya in the low-resource context of Bangladesh. We constructed a question--answer dataset from authoritative sources and assessed model outputs through semantic similarity, expert-model cross-evaluation, and Natural Language Inference (NLI). Findings highlight both the strengths and limitations of LLMs in representing epidemiological history and health crisis knowledge, underscoring their promise and risks for informing policy in resource-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Health Crisis Knowledge
Resource-Limited Settings
Historical Epidemiology
Model Reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid multi-metric evaluation
large language models
health crisis knowledge
resource-limited settings
natural language inference
🔎 Similar Papers
No similar papers found.