Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate pose estimation caused by the lack of end-effector proprioception in minimally invasive robotic surgery, this paper proposes a real-time vision-based pose correction method. Methodologically, it introduces the first end-to-end differentiable robotic kinematic model tightly coupled with neural rendering (built upon Kaolin/DiffRend), forming a vision Transformer-driven simulation-to-real joint optimization framework that enables noise-robust self-supervised training in simulation and effective sim-to-real transfer. Experiments demonstrate single-frame inference latency under 10 ms and a 62% reduction in pose estimation error compared to joint encoder–based approaches, significantly improving both accuracy and generalization of visual pose estimation. The core contribution lies in establishing a novel differentiable kinematics–neural rendering co-design paradigm, providing a new pathway toward high-accuracy, low-latency, and robust visual pose perception for surgical robots.

Technology Category

Application Category

📝 Abstract
Autonomy in Minimally Invasive Robotic Surgery (MIRS) has the potential to reduce surgeon cognitive and task load, thereby increasing procedural efficiency. However, implementing accurate autonomous control can be difficult due to poor end-effector proprioception, a limitation of their cable-driven mechanisms. Although the robot may have joint encoders for the end-effector pose calculation, various non-idealities make the entire kinematics chain inaccurate. Modern vision-based pose estimation methods lack real-time capability or can be hard to train and generalize. In this work, we demonstrate a real-time capable, vision transformer-based pose estimation approach that is trained using end-to-end differentiable kinematics and rendering in simulation. We demonstrate the potential of this method to correct for noisy pose estimates in simulation, with the longer term goal of verifying the sim-to-real transferability of our approach.
Problem

Research questions and friction points this paper is trying to address.

Improving real-time visual tool pose correction in robotic surgery
Addressing inaccurate kinematics in cable-driven surgical robots
Enhancing vision-based pose estimation with differentiable simulation training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision transformer-based real-time pose estimation
End-to-end differentiable kinematics training
Differentiable simulation for sim-to-real transfer
🔎 Similar Papers
No similar papers found.
Shuyuan Yang
Shuyuan Yang
Xidian University
Professor
Z
Zonghe Chua
Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University, Cleveland, OH 44106 USA