Beyond Rigid AI: Towards Natural Human-Machine Symbiosis for Interoperative Surgical Assistance

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI-powered surgical assistance systems suffer from rigid task definitions, fixed category priors, and dependence on dense, labor-intensive annotations—hindering dynamic, natural intraoperative human–machine interaction. To address this, we propose a memory-augmented multimodal perception agent that integrates speech-driven large language model (LLM) prompting, the Segment Anything Model (SAM), and an arbitrary-point tracking module, enabling intuitive, zero-shot segmentation of unseen surgical objects and cross-scenario generalization. The agent operates without explicit manual prompts or predefined semantic categories, supporting real-time, adaptive human–robot collaboration. Evaluated on public benchmarks, its segmentation accuracy matches expert manual annotations; on our novel in-house dataset, it successfully generalizes to previously unseen surgical instruments and simulated grafts. These results demonstrate strong zero-shot generalization capability and clinical translatability, advancing the paradigm of symbiotic human–machine surgery.

Technology Category

Application Category

📝 Abstract
Emerging surgical data science and robotics solutions, especially those designed to provide assistance in situ, require natural human-machine interfaces to fully unlock their potential in providing adaptive and intuitive aid. Contemporary AI-driven solutions remain inherently rigid, offering limited flexibility and restricting natural human-machine interaction in dynamic surgical environments. These solutions rely heavily on extensive task-specific pre-training, fixed object categories, and explicit manual-prompting. This work introduces a novel Perception Agent that leverages speech-integrated prompt-engineered large language models (LLMs), segment anything model (SAM), and any-point tracking foundation models to enable a more natural human-machine interaction in real-time intraoperative surgical assistance. Incorporating a memory repository and two novel mechanisms for segmenting unseen elements, Perception Agent offers the flexibility to segment both known and unseen elements in the surgical scene through intuitive interaction. Incorporating the ability to memorize novel elements for use in future surgeries, this work takes a marked step towards human-machine symbiosis in surgical procedures. Through quantitative analysis on a public dataset, we show that the performance of our agent is on par with considerably more labor-intensive manual-prompting strategies. Qualitatively, we show the flexibility of our agent in segmenting novel elements (instruments, phantom grafts, and gauze) in a custom-curated dataset. By offering natural human-machine interaction and overcoming rigidity, our Perception Agent potentially brings AI-based real-time assistance in dynamic surgical environments closer to reality.
Problem

Research questions and friction points this paper is trying to address.

Overcoming rigid AI limitations in surgical assistance
Enhancing natural human-machine interaction in surgery
Reducing reliance on task-specific pre-training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Speech-integrated LLMs for natural interaction
SAM and any-point tracking for real-time segmentation
Memory repository for memorizing novel elements
🔎 Similar Papers
No similar papers found.
Lalithkumar Seenivasan
Lalithkumar Seenivasan
Johns Hopkins University | National University of Singapore (PhD)
Healthcare AutomationMedical AIMedical RoboticsSurgical Data Science
J
Jiru Xu
Johns Hopkins University, Baltimore MD, USA
R
Roger D. Soberanis Mukul
Johns Hopkins University, Baltimore MD, USA
H
Hao Ding
Johns Hopkins University, Baltimore MD, USA
G
Grayson Byrd
Johns Hopkins University, Baltimore MD, USA; Johns Hopkins Applied Physics Laboratory, Laurel MD, USA
Yu-Chun Ku
Yu-Chun Ku
Johns Hopkins University
Digital TwinsRoboticsAugmented RealityVirtual RealityMixed Reality
J
Jose L. Porras
Johns Hopkins Medical Institutions, Baltimore MD, USA
M
Masaru Ishii
Johns Hopkins Medical Institutions, Baltimore MD, USA
Mathias Unberath
Mathias Unberath
Johns Hopkins University
Medical RoboticsComputer VisionAI/MLExtended RealityHCI