SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for visual guidance in minimally invasive surgery often conflate visual attention estimation with camera control or rely on object-centric assumptions, hindering stable and accurate field-of-view guidance. This work addresses surgical attention tracking as a spatiotemporal learning problem and introduces, for the first time, a framework that integrates temporal proposal re-ranking with motion-aware optimization to generate per-frame dense attention heatmaps, enabling continuous and interpretable visual guidance. The study contributes SurgAtt-1.16M, a large-scale clinically annotated benchmark supporting cross-institutional and cross-procedural analysis, and demonstrates state-of-the-art performance across multiple surgical datasets. The proposed approach exhibits strong robustness under challenging conditions—including occlusions, multi-instrument interference, and cross-domain scenarios—and can be directly deployed for robotic field-of-view planning and autonomous camera control.

Technology Category

Application Category

📝 Abstract
Accurate and stable field-of-view (FoV) guidance is critical for safe and efficient minimally invasive surgery, yet existing approaches often conflate visual attention estimation with downstream camera control or rely on direct object-centric assumptions. In this work, we formulate surgical attention tracking as a spatio-temporal learning problem and model surgeon focus as a dense attention heatmap, enabling continuous and interpretable frame-wise FoV guidance. We propose SurgAtt-Tracker, a holistic framework that robustly tracks surgical attention by exploiting temporal coherence through proposal-level reranking and motion-aware refinement, rather than direct regression. To support systematic training and evaluation, we introduce SurgAtt-1.16M, a large-scale benchmark with a clinically grounded annotation protocol that enables comprehensive heatmap-based attention analysis across procedures and institutions. Extensive experiments on multiple surgical datasets demonstrate that SurgAtt-Tracker consistently achieves state-of-the-art performance and strong robustness under occlusion, multi-instrument interference, and cross-domain settings. Beyond attention tracking, our approach provides a frame-wise FoV guidance signal that can directly support downstream robotic FoV planning and automatic camera control.
Problem

Research questions and friction points this paper is trying to address.

surgical attention tracking
field-of-view guidance
minimally invasive surgery
visual attention estimation
camera control
Innovation

Methods, ideas, or system contributions that make the work stand out.

surgical attention tracking
temporal proposal reranking
motion-aware refinement
attention heatmap
field-of-view guidance
🔎 Similar Papers
No similar papers found.
Rulin Zhou
Rulin Zhou
The Chinese University of Hong Kong Shenzhen Research Institute
Deep LearningMedical Image Processing
Guankun Wang
Guankun Wang
The Chinese University of Hong Kong
Computer visionImage analysis
An Wang
An Wang
The Chinese University of Hong Kong
Medical Image AnalysisSurgical Scene PerceptionMultimodal AI
Yujie Ma
Yujie Ma
Tsinghua University
product designsupply chain
L
Lixin Ouyang
College of Mechatronics and Control Engineering, Shenzhen University, China
B
Bolin Cui
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Junyan Li
Junyan Li
UMass Amherst
Foundation ModelsEfficient AI
C
Chaowei Zhu
Division of Gastrointestinal Surgery, Shenzhen People’s Hospital, China
Mingyang Li
Mingyang Li
Associate Professor, Industrial and Management Systems Engineering, The University of South Florida
data sciencereliability and qualitysystem informaticscomplex systems modeling and optimizationcomputational intelligence
M
Ming Chen
College of Mechatronics and Control Engineering, Shenzhen University, China
X
Xiaopin Zhong
College of Mechatronics and Control Engineering, Shenzhen University, China
P
Peng Lu
Department of Mechanical Engineering, The University of Hong Kong, Hong Kong SAR, China
Jiankun Wang
Jiankun Wang
Southern University of Science and Technology
RoboticsPath PlanningMotion ControlHuman-Robot Interaction
X
Xianming Liu
Division of Gastrointestinal Surgery, Shenzhen People’s Hospital, China
Hongliang Ren
Hongliang Ren
Chinese University of Hong Kong | National University of Singapore | JHU/Harvard(RF) | CUHK(PhD)
Biorobotics & intelligent systemsmedical mechatronicscontinuumsoft flexible robots/sensorsmultisensory perception