Developing and evaluating a chatbot to support maternal health care

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This study addresses the challenge of providing safe and effective health information to pregnant women in resource-limited settings, where low health literacy, ambiguous queries, and multilingual communication often impede access to care. The authors present a multilingual chatbot tailored for pregnant women in India, integrating stage-aware triage, guideline-based hybrid retrieval, and evidence-conditioned generation using large language models (LLMs), all within a defense-in-depth architecture to ensure response safety. To evaluate performance in high-risk scenarios under limited expert supervision, they propose a multidimensional assessment framework comprising a triage benchmark (achieving 86.7% recall for emergencies), a synthetically generated retrieval dataset, and an LLM-as-judge mechanism. Clinical reliability was validated on 150 triage samples, 100 evidence-annotated responses, and 781 real user queries.

Technology Category

Application Category

📝 Abstract

The ability to provide trustworthy maternal health information using phone-based chatbots can have a significant impact, particularly in low-resource settings where users have low health literacy and limited access to care. However, deploying such systems is technically challenging: user queries are short, underspecified, and code-mixed across languages, answers require regional context-specific grounding, and partial or missing symptom context makes safe routing decisions difficult. We present a chatbot for maternal health in India developed through a partnership between academic researchers, a health tech company, a public health nonprofit, and a hospital. The system combines (1) stage-aware triage, routing high-risk queries to expert templates, (2) hybrid retrieval over curated maternal/newborn guidelines, and (3) evidence-conditioned generation from an LLM. Our core contribution is an evaluation workflow for high-stakes deployment under limited expert supervision. Targeting both component-level and end-to-end testing, we introduce: (i) a labeled triage benchmark (N=150) achieving 86.7% emergency recall, explicitly reporting the missed-emergency vs. over-escalation trade-off; (ii) a synthetic multi-evidence retrieval benchmark (N=100) with chunk-level evidence labels; (iii) LLM-as-judge comparison on real queries (N=781) using clinician-codesigned criteria; and (iv) expert validation. Our findings show that trustworthy medical assistants in multilingual, noisy settings require defense-in-depth design paired with multi-method evaluation, rather than any single model and evaluation method choice.

Problem

Research questions and friction points this paper is trying to address.

maternal health

chatbot

low-resource settings

health literacy

multilingual

Innovation

Methods, ideas, or system contributions that make the work stand out.

maternal health chatbot

stage-aware triage

hybrid retrieval