🤖 AI Summary
This study addresses the high prevalence of diagnostic and therapeutic errors in primary care through a real-world quality improvement trial. In 15 Penda Health clinics across Kenya, we deployed AI Consult—a responsible, lightweight large language model (LLM)-based clinical decision support system integrated non-intrusively into clinical workflows, delivering real-time, context-aware feedback only at critical decision points to preserve clinician autonomy and safety. Analyzing nearly 40,000 outpatient encounters, we observed a 16% reduction in diagnostic errors and a 13% reduction in treatment errors—translating to an estimated 22,000 avoided diagnostic and 29,000 avoided treatment errors annually. The study introduces the first responsible AI implementation framework specifically designed for primary care settings, empirically validating LLMs as a “clinical safety net.” It provides a scalable methodology and robust evidence for deploying trustworthy AI in resource-constrained health systems.
📝 Abstract
We evaluate the impact of large language model-based clinical decision support in live care. In partnership with Penda Health, a network of primary care clinics in Nairobi, Kenya, we studied AI Consult, a tool that serves as a safety net for clinicians by identifying potential documentation and clinical decision-making errors. AI Consult integrates into clinician workflows, activating only when needed and preserving clinician autonomy. We conducted a quality improvement study, comparing outcomes for 39,849 patient visits performed by clinicians with or without access to AI Consult across 15 clinics. Visits were rated by independent physicians to identify clinical errors. Clinicians with access to AI Consult made relatively fewer errors: 16% fewer diagnostic errors and 13% fewer treatment errors. In absolute terms, the introduction of AI Consult would avert diagnostic errors in 22,000 visits and treatment errors in 29,000 visits annually at Penda alone. In a survey of clinicians with AI Consult, all clinicians said that AI Consult improved the quality of care they delivered, with 75% saying the effect was "substantial". These results required a clinical workflow-aligned AI Consult implementation and active deployment to encourage clinician uptake. We hope this study demonstrates the potential for LLM-based clinical decision support tools to reduce errors in real-world settings and provides a practical framework for advancing responsible adoption.