Towards Conversational Medical AI with Eyes, Ears and a Voice

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
This study addresses the limitations of existing medical dialogue AI systems, which rely solely on text and thus fail to capture critical audiovisual cues present in real clinical encounters, lacking real-time multimodal decision support. The authors propose AI co-clinician, the first real-time dual-agent medical dialogue system capable of processing continuous audiovisual input. By leveraging a low-latency architecture based on Gemini, it enables collaborative deep clinical reasoning and natural conversational interaction through a novel dual-agent coordination mechanism. The work introduces a standardized outpatient simulation environment, along with the TelePACES multidimensional evaluation framework and case-specific scoring criteria. In 120 simulated consultations, the system approached the performance of primary care physicians in core dimensions such as diagnostic planning and differential diagnosis, significantly outperforming GPT-Realtime, though room for improvement remains in specialty-specific physical examination and disease-focused reasoning.
📝 Abstract
The practice of medicine relies not only upon skillful dialogue but also on the nuanced exchange and interpretation of rich auditory and visual cues between doctors and patients. Building on the low-latency voice and video processing capabilities of Gemini, we introduce AI co-clinician, a first-of-its-kind conversational AI system utilizing continuous streams of audio-visual data from live patient conversations to inform real-time clinical decisions. Its dual-agent architecture balances deep clinical reasoning with the low latency required for natural dialogue. To assess this system, we implemented a video-based interface emulating telemedicine consultations. We crafted 20 standardized outpatient scenarios requiring proactive real-time auditory and visual reasoning and designed "TelePACES" evaluation criteria alongside case-specific rubrics. In a randomized, interface-blinded, crossover simulation study (n = 120 encounters) with 10 internal medicine residents as patient actors, we compared AI co-clinician with primary care physicians (PCPs), GPT-Realtime, and a baseline agent. AI co-clinician approached PCPs in key TelePACES dimensions, including management plans and differential diagnosis, while significantly outperforming GPT-Realtime across all general criteria. While our agent demonstrated parity with PCPs in case-specific triage measures, physicians maintained superior overall performance in case-specific assessments. Although AI co-clinician marks a significant advance in real-time telemedical AI, gaps remain in physical examination and disease-specific reasoning. Our work shows that text-only approaches fail to capture the true challenges of medical consultation and suggests that high-stakes real-time diagnostic AI is most safely advanced in collaborative, triadic models where AI can be a supportive co-clinician for doctors and patients.
Problem

Research questions and friction points this paper is trying to address.

conversational medical AI
audio-visual cues
real-time clinical decision
telemedicine
multimodal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal medical AI
real-time clinical reasoning
audio-visual dialogue system
dual-agent architecture
TelePACES evaluation
M
Meet Shah
Google DeepMind
J
Jason Gusdorf
Beth Israel Deaconess Medical Center, Harvard Medical School
Anil Palepu
Anil Palepu
PhD Student, Harvard-MIT Health Science & Technology
Chunjong Park
Chunjong Park
Google DeepMind
J
Jack W. O'Sullivan
Stanford University
V
Vishnu Ravi
Stanford University
Tim Strother
Tim Strother
Google DeepMind
Deep LearningMachine Learning
P
Pavel Dubov
Google DeepMind
A
Aliya Rysbek
Google DeepMind
T
Toshiyuki Fukuzawa
Google DeepMind
Y
Yana Lunts
Google DeepMind
J
Jan Freyberg
Google DeepMind
Michael B. Chang
Michael B. Chang
Research Scientist, Google DeepMind
Artificial IntelligenceDeep LearningMachine LearningDeep Reinforcement Learning
Aniruddh Raghu
Aniruddh Raghu
MIT
David Stutz
David Stutz
Research Scientist, DeepMind
deep learningai agentsai for scienceuncertainty estimationcomputer vision
D
Devora Berlowitz
Google DeepMind
E
Eliseo Papa
Google DeepMind
T
Taylan Cemgil
Google DeepMind
J
JD Velasquez
Google DeepMind
J
Jack Chen
Google DeepMind
A
Arthur Chen
Google DeepMind
D
Doug Fritz
Google DeepMind
C
Charlie Taylor
Google DeepMind
K
Katya Tregubova
Google DeepMind
J
Jing Rong Lim
Google DeepMind