Anatomy Might Be All You Need: Forecasting What to Do During Surgery

📅 2025-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address real-time “what’s next?” decision-making during neurosurgical procedures, this paper introduces, for the first time, the intraoperative manual instrument future trajectory prediction task and proposes a self-supervised learning paradigm that eliminates the need for explicit trajectory annotations. Methodologically, we integrate temporal modeling of instrument pose with joint anatomical–instrument detection, trained end-to-end on transnasal transsphenoidal pituitary surgery videos. Crucially, we demonstrate that high-accuracy motion prediction can be achieved using anatomical features alone—bypassing conventional reliance on motion history. Evaluated on real surgical videos, our approach reduces 3-second trajectory prediction error by 37% over state-of-the-art baselines. This validates the critical role of anatomical priors in intelligent surgical guidance and establishes a novel paradigm for manual neurosurgical navigation.

Technology Category

Application Category

📝 Abstract
Surgical guidance can be delivered in various ways. In neurosurgery, spatial guidance and orientation are predominantly achieved through neuronavigation systems that reference pre-operative MRI scans. Recently, there has been growing interest in providing live guidance by analyzing video feeds from tools such as endoscopes. Existing approaches, including anatomy detection, orientation feedback, phase recognition, and visual question-answering, primarily focus on aiding surgeons in assessing the current surgical scene. This work aims to provide guidance on a finer scale, aiming to provide guidance by forecasting the trajectory of the surgical instrument, essentially addressing the question of what to do next. To address this task, we propose a model that not only leverages the historical locations of surgical instruments but also integrates anatomical features. Importantly, our work does not rely on explicit ground truth labels for instrument trajectories. Instead, the ground truth is generated by a detection model trained to detect both anatomical structures and instruments within surgical videos of a comprehensive dataset containing pituitary surgery videos. By analyzing the interaction between anatomy and instrument movements in these videos and forecasting future instrument movements, we show that anatomical features are a valuable asset in addressing this challenging task. To the best of our knowledge, this work is the first attempt to address this task for manually operated surgeries.
Problem

Research questions and friction points this paper is trying to address.

Neurosurgical Tool Prediction
Assisted Decision Making
Surgical Action Forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neurosurgical Tool Prediction
Surgical Video Analysis
Real-time Guidance
Gary Sarwin
Gary Sarwin
ETH Zurich
Medical Image Analysis
A
Alessandro Carretta
Department of Neurosurgery, University Hospital of Zurich, Zurich, Switzerland
V
Victor Staartjes
Department of Neurosurgery, University Hospital of Zurich, Zurich, Switzerland
M
Matteo Zoli
Department of Biomedical and Neuromotor Sciences (DIBINEM), University of Bologna, Bologna, Italy
D
Diego Mazzatenta
Department of Biomedical and Neuromotor Sciences (DIBINEM), University of Bologna, Bologna, Italy
Luca Regli
Luca Regli
Department of Neurosurgery, University Hospital of Zurich, Zurich, Switzerland
Carlo Serra
Carlo Serra
Neurosurgery University Hospital Zurich
Ender Konukoglu
Ender Konukoglu
ETH Zurich
Medical Image AnalysisBiophysical Modeling