SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Image-guided pituitary surgery demands intraoperative AI co-pilots capable of dynamic interaction and task planning, yet existing static models lack support for multimodal real-time decision-making in this complex neurosurgical context. Method: We introduce PitAgent—the first context-aware multimodal dataset specifically designed for pituitary surgery—and propose FFT-GaLore, an efficient low-rank fine-tuning method enabling lightweight adaptation of LLaMA 3.2 for structured surgical task planning. Our end-to-end system integrates a vision-language model (VLM), multimodal task orchestration, anatomical segmentation, preoperative-intraoperative image registration, surgical instrument tracking, and surgical visual question answering (VQA). Contribution/Results: Experiments demonstrate state-of-the-art performance in surgical task planning and prompt generation; zero-shot surgical VQA achieves significantly improved semantic accuracy. The system validates real-time responsiveness, clinical interpretability, and interactive usability in intraoperative settings.

Technology Category

Application Category

📝 Abstract
Image-guided surgery demands adaptive, real-time decision support, yet static AI models struggle with structured task planning and providing interactive guidance. Large vision-language models (VLMs) offer a promising solution by enabling dynamic task planning and predictive decision support. We introduce SurgicalVLM-Agent, an AI co-pilot for image-guided pituitary surgery, capable of conversation, planning, and task execution. The agent dynamically processes surgeon queries and plans the tasks such as MRI tumor segmentation, endoscope anatomy segmentation, overlaying preoperative imaging with intraoperative views, instrument tracking, and surgical visual question answering (VQA). To enable structured task planning, we develop the PitAgent dataset, a surgical context-aware dataset covering segmentation, overlaying, instrument localization, tool tracking, tool-tissue interactions, phase identification, and surgical activity recognition. Additionally, we propose FFT-GaLore, a fast Fourier transform (FFT)-based gradient projection technique for efficient low-rank adaptation, optimizing fine-tuning for LLaMA 3.2 in surgical environments. We validate SurgicalVLM-Agent by assessing task planning and prompt generation on our PitAgent dataset and evaluating zero-shot VQA using a public pituitary dataset. Results demonstrate state-of-the-art performance in task planning and query interpretation, with highly semantically meaningful VQA responses, advancing AI-driven surgical assistance.
Problem

Research questions and friction points this paper is trying to address.

Develops AI co-pilot for real-time pituitary surgery decision support.
Enables dynamic task planning and interactive guidance in surgery.
Optimizes fine-tuning for surgical AI using FFT-based gradient projection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic task planning using vision-language models
FFT-based gradient projection for efficient fine-tuning
Interactive AI co-pilot for real-time surgical guidance
🔎 Similar Papers
No similar papers found.
J
Jiayuan Huang
UCL Hawkes Institute, University College London, UK; Dept of Medical Physics & Biomedical Engineering, University College London, UK
R
Runlong He
UCL Hawkes Institute, University College London, UK; Dept of Medical Physics & Biomedical Engineering, University College London, UK
D
Danyal Z. Khan
UCL Hawkes Institute, University College London, UK; Dept of Neurosurgery, National Hospital for Neurology and Neurosurgery, UK
Evangelos Mazomenos
Evangelos Mazomenos
Associate Professor, University College London
Computer-Assisted InterventionsSurgical Data ScienceSurgical RoboticsBiomedical Signal Process
Danail Stoyanov
Danail Stoyanov
Professor of Robot Vision, University College London
Surgical VisionSurgical AISurgical RoboticsComputer Assisted InterventionsSurgical Data Science
H
Hani J. Marcus
UCL Hawkes Institute, University College London, UK; Dept of Neurosurgery, National Hospital for Neurology and Neurosurgery, UK
Matthew J. Clarkson
Matthew J. Clarkson
Professor of Biomedical Engineering at University College London
Image Guided SurgeryMedical Image ComputingImage RegistrationComputer Vision
M
Mobarakol Islam
UCL Hawkes Institute, University College London, UK; Dept of Medical Physics & Biomedical Engineering, University College London, UK