🤖 AI Summary
This study evaluates the reliability and applicability of large language models (LLMs) in disseminating historical public health crisis knowledge in resource-constrained settings. Focusing on COVID-19, dengue, Nipah virus, and chikungunya, the authors construct a multilingual question-answering dataset and introduce a novel hybrid evaluation framework that integrates semantic similarity, natural language inference, and expert–model cross-validation. The framework is systematically applied to assess mainstream LLMs—including GPT-4, Gemini Pro, Llama-3, and Mistral-7B—in low-resource contexts such as Bangladesh. The findings reveal critical limitations and potential risks in the models’ ability to accurately convey complex epidemiological knowledge, offering empirical evidence and cautionary insights for deploying LLMs to support public health decision-making in underserved regions.
📝 Abstract
Large Language Models (LLMs) offer significant potential for delivering health information. However, their reliability in low-resource contexts remains uncertain. This study evaluates GPT-4, Gemini Pro, Llama~3, and Mistral-7B on health crisis-related enquiries concerning COVID-19, dengue, the Nipah virus, and Chikungunya in the low-resource context of Bangladesh. We constructed a question--answer dataset from authoritative sources and assessed model outputs through semantic similarity, expert-model cross-evaluation, and Natural Language Inference (NLI). Findings highlight both the strengths and limitations of LLMs in representing epidemiological history and health crisis knowledge, underscoring their promise and risks for informing policy in resource-constrained environments.