Towards physician-centered oversight of conversational diagnostic AI

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of effective asynchronous clinical supervision and unclear accountability in diagnostic conversational AI. We propose g-AMIE, a physician-led asynchronous supervision framework featuring a novel “interview–supervision” decoupling mechanism: a multi-agent system conducts structured, safety-constrained history-taking without generating individualized diagnostic or treatment recommendations; attending physicians asynchronously review outputs via a clinical dashboard and retain ultimate decisional responsibility. Technically, g-AMIE integrates conversational AI, dynamic safety boundaries, and an interactive visual review interface, validated across multiple dimensions using virtual Objective Structured Clinical Examinations (OSCEs). In 60 clinical scenarios, g-AMIE significantly outperformed nurse practitioner/physician assistant (NP/PA) and primary care physician (PCP) baselines in interview quality, case summarization, and diagnostic support. Physician review efficiency improved by 37%, demonstrating strong alignment with clinical safety, accountability, and practical utility.

Technology Category

Application Category

📝 Abstract
Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians' capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.
Problem

Research questions and friction points this paper is trying to address.

Ensuring physician oversight of AI diagnostic systems
Developing guardrails for AI to abstain from medical advice
Improving efficiency and quality of asynchronous clinical decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with guardrails for safety
Clinician cockpit interface for physician oversight
Asynchronous oversight improves decision quality
Elahe Vedadi
Elahe Vedadi
Google DeepMind
AIDistributed ComputingInformation TheorySecure & Private Computing
D
David Barrett
Google DeepMind
N
Natalie Harris
Google Research
Ellery Wulczyn
Ellery Wulczyn
Staff Software Engineer, Google Research
Applied Machine LearningHealthcareDigital Pathology
Shashir Reddy
Shashir Reddy
Engineer, Google, Inc.
Roma Ruparel
Roma Ruparel
Unknown affiliation
Mike Schaekermann
Mike Schaekermann
Computer Science PhD, Eng BSc, Medicine State Exam I
Human-Computer InteractionMachine LearningMedicine
Tim Strother
Tim Strother
Google DeepMind
Deep LearningMachine Learning
Ryutaro Tanno
Ryutaro Tanno
Research Scientist, Google DeepMind
Machine LearningDeep LearningHealthcareComputer Vision
Y
Yash Sharma
Google Research
J
Jihyeon Lee
Google Research
C
Cían Hughes
Google Research
Dylan Slack
Dylan Slack
Google DeepMind
deep learningnatural language processingrobustness
Anil Palepu
Anil Palepu
PhD Student, Harvard-MIT Health Science & Technology
J
Jan Freyberg
Google DeepMind
K
Khaled Saab
Google DeepMind
Valentin Liévin
Valentin Liévin
Google DeepMind
machine learninghealthcare
Wei-Hung Weng
Wei-Hung Weng
Google DeepMind
artificial intelligencemachine learningnatural language processingmedical imaginghealthcare
Tao Tu
Tao Tu
Columbia University, Google
multi-modal neuroimagingmachine learningneural information processing
Y
Yun Liu
Google Research
Nenad Tomasev
Nenad Tomasev
Google DeepMind
artificial intelligencemachine learningstochastic optimizationartificial lifebioinformatics
K
Kavita Kulkarni
Google Research
S. Sara Mahdavi
S. Sara Mahdavi
Google DeepMind
Biomedical interventions in medicineMachine Learning
Kelvin Guu
Kelvin Guu
Principal Research Scientist / Director, Google DeepMind
Deep LearningArtificial IntelligenceMachine LearningNatural Language ProcessingStatistics
J
Joëlle Barral
Google DeepMind