SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
This work addresses the limitations of existing approaches to suture needle trajectory prediction in endoscopic videos—specifically, their neglect of sequential action dependencies and insufficient supervision from sparsely annotated data—by formulating the task for the first time as a pixel-level sequential action learning problem. Treating the needle tip as an agent, the authors employ a goal-conditioned offline reinforcement learning framework, leveraging cubic spline interpolation to convert sparse annotations into dense rewards. The approach further incorporates autoregressive action prediction, a spatiotemporal observation encoder, and a hybrid discrete-continuous action space to effectively capture motion continuity and physical plausibility. Evaluated on a new renal suturing dataset comprising 1,158 trajectories, the method reduces the average displacement error by 58.6% compared to the strongest baseline.
📝 Abstract
Predicting surgical needle trajectories from endoscopic video is critical for robot-assisted suturing, enabling anticipatory planning, real-time guidance, and safer motion execution. Existing methods that directly learn motion distributions from visual observations tend to overlook the sequential dependency among adjacent motion steps. Moreover, sparse waypoint annotations often fail to provide sufficient supervision, further increasing the difficulty of supervised or imitation learning methods. To address these challenges, we formulate image-based needle trajectory prediction as a sequential decision-making problem, in which the needle tip is treated as an agent that moves step by step in pixel space. This formulation naturally captures the continuity of needle motion and enables the explicit modeling of physically plausible pixel-wise state transitions over time. From this perspective, we propose SutureAgent, a goal-conditioned offline reinforcement learning framework that leverages sparse annotations to dense reward signals via cubic spline interpolation, encouraging the policy to exploit limited expert guidance while exploring plausible future motion paths. SutureAgent encodes variable-length clips using an observation encoder to capture both local spatial cues and long-range temporal dynamics, and autoregressively predicts future waypoints through actions composed of discrete directions and continuous magnitudes. To enable stable offline policy optimization from expert demonstrations, we adopt Conservative Q-Learning with Behavioral Cloning regularization. Experiments on a new kidney wound suturing dataset containing 1,158 trajectories from 50 patients show that SutureAgent reduces Average Displacement Error by 58.6% compared with the strongest baseline, demonstrating the effectiveness of modeling needle trajectory prediction as pixel-level sequential action learning.
Problem

Research questions and friction points this paper is trying to address.

surgical trajectory prediction
robot-assisted suturing
sequential decision-making
offline reinforcement learning
endoscopic video
Innovation

Methods, ideas, or system contributions that make the work stand out.

goal-conditioned offline RL
pixel-space trajectory prediction
surgical needle tracking
sequential decision-making
Conservative Q-Learning
🔎 Similar Papers
No similar papers found.
H
Huanrong Liu
University of Macau, Macau, China
Chunlin Tian
Chunlin Tian
University of Macau
MLSys
T
Tongyu Jia
The Chinese PLA General Hospital, Beijing, China
T
Tailai Zhou
The Chinese PLA General Hospital, Beijing, China
Qin Liu
Qin Liu
University of Macau
marine policyhigher education
Y
Yu Gao
The Chinese PLA General Hospital, Beijing, China
Y
Yutong Ban
Shanghai Jiao Tong University, Shanghai, China
Yun Gu
Yun Gu
Shanghai Jiao Tong University
Medical Image AnalysisComputer-Assisted Intervention
Guy Rosman
Guy Rosman
Toyota Research Institute; Massachusetts General Hospital; Duke Surgery
Computer vision and robotic perceptionBayesian inferencetrajectory prediction
X
Xin Ma
The Chinese PLA General Hospital, Beijing, China
Qingbiao Li
Qingbiao Li
University of Macau
Robot LearningGraph Neural NetworksImitation LearningReinforcement LearningMedical Imaging