Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction

πŸ“… 2025-11-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In da Vinci robotic surgery, surgeons face clinical challenges in hands-free operation and real-time access to multimodal patient data. Method: This paper proposes a hierarchical multi-agent, voice-driven surgical collaboration platform centered on a large language model (LLM) as the orchestrator, integrating automatic speech recognition, semantic understanding, and task orchestration to enable clinical information retrieval, CT image interaction, and 3D anatomical navigation. Contribution/Results: We introduce a novel hierarchical multi-agent architecture and a multi-level evaluation metric (MOEM), significantly enhancing robustness against speech errors and ambiguous commands while supporting autonomous intraoperative task planning and semantic reasoning. Evaluated on 240 real-world surgical voice commands, the system achieves high accuracy and task success rates, demonstrating its feasibility for efficient and reliable human–robot collaboration in minimally invasive robotic surgery.

Technology Category

Application Category

πŸ“ Abstract
In da Vinci robotic surgery, surgeons'hands and eyes are fully engaged in the procedure, making it difficult to access and manipulate multimodal patient data without interruption. We propose a voice-directed Surgical Agent Orchestrator Platform (SAOP) built on a hierarchical multi-agent framework, consisting of an orchestration agent and three task-specific agents driven by Large Language Models (LLMs). These LLM-based agents autonomously plan, refine, validate, and reason to map voice commands into specific tasks such as retrieving clinical information, manipulating CT scans, or navigating 3D anatomical models on the surgical video. We also introduce a Multi-level Orchestration Evaluation Metric (MOEM) to comprehensively assess the performance and robustness from command-level and category-level perspectives. The SAOP achieves high accuracy and success rates across 240 voice commands, while LLM-based agents improve robustness against speech recognition errors and diverse or ambiguous free-form commands, demonstrating strong potential to support minimally invasive da Vinci robotic surgery.
Problem

Research questions and friction points this paper is trying to address.

Enables voice-controlled data access during hands-free robotic surgery
Maps voice commands to medical data retrieval and manipulation tasks
Improves robustness against speech errors and ambiguous commands
Innovation

Methods, ideas, or system contributions that make the work stand out.

Voice-directed multi-agent platform for robotic surgery
LLM-based agents autonomously map commands to tasks
Multi-level evaluation metric assesses performance robustness
πŸ”Ž Similar Papers
No similar papers found.
H
Hyeryun Park
Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.
B
Byung Mo Gu
Image Guided Precision Cancer Surgery Institute, College of Medicine, Korea University, Seoul, Republic of Korea.
Jun Hee Lee
Jun Hee Lee
Image Guided Precision Cancer Surgery Institute, College of Medicine, Korea University, Seoul, Republic of Korea.
B
Byeong Hyeon Choi
Image Guided Precision Cancer Surgery Institute, College of Medicine, Korea University, Seoul, Republic of Korea.
Sekeun Kim
Sekeun Kim
Massachusetts General Hospital / Harvard Medical School
Medical imaging computingCardiovascular AIFoundation ModelGenerative Model
H
Hyun Koo Kim
Image Guided Precision Cancer Surgery Institute, College of Medicine, Korea University, Seoul, Republic of Korea.
Kyungsang Kim
Kyungsang Kim
Assistant Professor at Harvard Medical School and Mass General Hospital
Deep learningLogical AICompressed sensingMedical imagingOptimization