đ¤ AI Summary
Existing robotic ultrasound research primarily focuses on binary interactionsâeither patientârobot or clinicianârobotâneglecting communication gaps in triadic clinicianârobotâpatient collaboration. This paper introduces the Intelligent Virtual Sonographer (IVS), the first multimodal virtual agent designed for extended reality (XR) environments. IVS integrates large language models, automatic speech recognition, text-to-speech synthesis, and robotic control to enable natural-language-driven real-time triadic interaction. It accurately interprets clinician commands to control the ultrasound robot while simultaneously providing empathetic, transparent verbal explanations of procedures to patients. Experimental evaluation demonstrates that IVS significantly improves procedural efficiency, clinicianâpatient trust, and patient experience. By bridging semantic misalignment in humanârobotâpatient collaborative diagnosis and intervention, IVS establishes a foundational technical framework for context-aware, linguistically grounded medical robotics in XR.
đ Abstract
The advancement and maturity of large language models (LLMs) and robotics have unlocked vast potential for human-computer interaction, particularly in the field of robotic ultrasound. While existing research primarily focuses on either patient-robot or physician-robot interaction, the role of an intelligent virtual sonographer (IVS) bridging physician-robot-patient communication remains underexplored. This work introduces a conversational virtual agent in Extended Reality (XR) that facilitates real-time interaction between physicians, a robotic ultrasound system(RUS), and patients. The IVS agent communicates with physicians in a professional manner while offering empathetic explanations and reassurance to patients. Furthermore, it actively controls the RUS by executing physician commands and transparently relays these actions to the patient. By integrating LLM-powered dialogue with speech-to-text, text-to-speech, and robotic control, our system enhances the efficiency, clarity, and accessibility of robotic ultrasound acquisition. This work constitutes a first step toward understanding how IVS can bridge communication gaps in physician-robot-patient interaction, providing more control and therefore trust into physician-robot interaction while improving patient experience and acceptance of robotic ultrasound.