Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address inaccurate pose estimation caused by the lack of end-effector proprioception in minimally invasive robotic surgery, this paper proposes a real-time vision-based pose correction method. Methodologically, it introduces the first end-to-end differentiable robotic kinematic model tightly coupled with neural rendering (built upon Kaolin/DiffRend), forming a vision Transformer-driven simulation-to-real joint optimization framework that enables noise-robust self-supervised training in simulation and effective sim-to-real transfer. Experiments demonstrate single-frame inference latency under 10 ms and a 62% reduction in pose estimation error compared to joint encoder–based approaches, significantly improving both accuracy and generalization of visual pose estimation. The core contribution lies in establishing a novel differentiable kinematics–neural rendering co-design paradigm, providing a new pathway toward high-accuracy, low-latency, and robust visual pose perception for surgical robots.

Technology Category

Application Category

📝 Abstract

Autonomy in Minimally Invasive Robotic Surgery (MIRS) has the potential to reduce surgeon cognitive and task load, thereby increasing procedural efficiency. However, implementing accurate autonomous control can be difficult due to poor end-effector proprioception, a limitation of their cable-driven mechanisms. Although the robot may have joint encoders for the end-effector pose calculation, various non-idealities make the entire kinematics chain inaccurate. Modern vision-based pose estimation methods lack real-time capability or can be hard to train and generalize. In this work, we demonstrate a real-time capable, vision transformer-based pose estimation approach that is trained using end-to-end differentiable kinematics and rendering in simulation. We demonstrate the potential of this method to correct for noisy pose estimates in simulation, with the longer term goal of verifying the sim-to-real transferability of our approach.

Problem

Research questions and friction points this paper is trying to address.

Improving real-time visual tool pose correction in robotic surgery

Addressing inaccurate kinematics in cable-driven surgical robots

Enhancing vision-based pose estimation with differentiable simulation training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision transformer-based real-time pose estimation

End-to-end differentiable kinematics training

Differentiable simulation for sim-to-real transfer

🔎 Similar Papers

No similar papers found.