Data-centric Design of Learning-based Surgical Gaze Perception Models in Multi-Task Simulation

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the high cost of collecting expert active gaze data in robot-assisted minimally invasive surgery and the unclear impact of skill level and perceptual modality—active execution versus passive observation—on attention modeling. The authors present the first paired active-passive multitask surgical gaze dataset, simultaneously recording operators’ active gaze using the da Vinci SimNow simulator integrated with VR-based eye tracking, while reusing identical video stimuli to collect passive gaze from observers. Through gaze density overlap analysis and saliency modeling (MSI-Net, SalGAN), the work enables the first controlled comparison of active and passive gaze within the same surgical scenes. Results demonstrate that passive gaze effectively approximates the active attention of intermediate-level experts, and that novice-derived passive labels incur only limited performance degradation when applied to high-quality demonstrations, thereby validating the feasibility and scalability of crowdsourced gaze annotation.

Technology Category

Application Category

📝 Abstract
In robot-assisted minimally invasive surgery (RMIS), reduced haptic feedback and depth cues increase reliance on expert visual perception, motivating gaze-guided training and learning-based surgical perception models. However, operative expert gaze is costly to collect, and it remains unclear how the source of gaze supervision, both expertise level (intermediate vs. novice) and perceptual modality (active execution vs. passive viewing), shapes what attention models learn. We introduce a paired active-passive, multi-task surgical gaze dataset collected on the da Vinci SimNow simulator across four drills. Active gaze was recorded during task execution using a VR headset with eye tracking, and the corresponding videos were reused as stimuli to collect passive gaze from observers, enabling controlled same-video comparisons. We quantify skill- and modality-dependent differences in gaze organization and evaluate the substitutability of passive gaze for operative supervision using fixation density overlap analyses and single-frame saliency modeling. Across settings, MSI-Net produced stable, interpretable predictions, whereas SalGAN was unstable and often poorly aligned with human fixations. Models trained on passive gaze recovered a substantial portion of intermediate active attention, but with predictable degradation, and transfer was asymmetric between active and passive targets. Notably, novice passive labels approximated intermediate-passive targets with limited loss on higher-quality demonstrations, suggesting a practical path for scalable, crowd-sourced gaze supervision in surgical coaching and perception modeling.
Problem

Research questions and friction points this paper is trying to address.

surgical gaze perception
gaze supervision
active vs. passive viewing
multi-task simulation
learning-based models
Innovation

Methods, ideas, or system contributions that make the work stand out.

surgical gaze perception
active-passive gaze dataset
data-centric learning
multi-task simulation
crowd-sourced supervision
🔎 Similar Papers
No similar papers found.
Y
Yizhou Li
Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University, Cleveland, OH 44106 USA
Shuyuan Yang
Shuyuan Yang
Xidian University
Professor
J
Jiaji Su
Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University, Cleveland, OH 44106 USA
Z
Zonghe Chua
Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University, Cleveland, OH 44106 USA