🤖 AI Summary
This work proposes the first multi-agent framework tailored for real-world clinical settings, addressing the limitations of existing clinical question-answering systems that rely heavily on standardized datasets and struggle with the heterogeneity and multimodality of electronic health records (EHRs) encountered in actual hospitals. By orchestrating collaborative AI agents, the framework integrates multi-source EHR data to support patient-level question answering, temporal reasoning, and multimodal evidence fusion. The approach effectively bridges the gap between benchmark evaluation and real-world deployment, achieving 86% accuracy on authentic clinical cases while meeting stringent clinical response-time requirements. Its clinical validity and practical utility have been further confirmed through physician chart review.
📝 Abstract
Clinical decision-making increasingly relies on timely and context-aware access to patient information within Electronic Health Records (EHRs), yet most existing natural language question-answering (QA) systems are evaluated solely on benchmark datasets, limiting their practical relevance. To overcome this limitation, we introduce EHRNavigator, a multi-agent framework that harnesses AI agents to perform patient-level question answering across heterogeneous and multimodal EHR data. We assessed its performance using both public benchmark and institutional datasets under realistic hospital conditions characterized by diverse schemas, temporal reasoning demands, and multimodal evidence integration. Through quantitative evaluation and clinician-validated chart review, EHRNavigator demonstrated strong generalization, achieving 86% accuracy on real-world cases while maintaining clinically acceptable response times. Overall, these findings confirm that EHRNavigator effectively bridges the gap between benchmark evaluation and clinical deployment, offering a robust, adaptive, and efficient solution for real-world EHR question answering.