🤖 AI Summary
This study addresses the challenge of accurate registration between preoperative CT scans and intraoperative laparoscopic videos in minimally invasive surgery by introducing reinforcement learning to 3D/2D liver registration for the first time, formulating it as a sequential decision-making process. The proposed method employs a shared feature encoder to extract geometric features from both CT renderings and laparoscopic images, and utilizes a discrete-action reinforcement learning policy network to iteratively select six-degree-of-freedom rigid transformations while autonomously determining when to terminate—eliminating the need for predefined step sizes or stopping criteria. Supervised pretraining is incorporated to warm-start the policy, enhancing convergence speed and stability. Evaluated on a public dataset, the approach achieves a mean target registration error of 15.70 mm, matching the accuracy of supervised methods with post-processing while offering faster inference and fully automated operation.
📝 Abstract
Registration between preoperative CT and intraoperative laparoscopic video plays a crucial role in augmented reality (AR) guidance for minimally invasive surgery. Learning-based methods have recently achieved registration errors comparable to optimization-based approaches while offering faster inference. However, many supervised methods produce coarse alignments that rely on additional optimization-based refinement, thereby increasing inference time. We present a discrete-action reinforcement learning (RL) framework that formulates CT-to-video registration as a sequential decision-making process. A shared feature encoder, warm-started from a supervised pose estimation network to provide stable geometric features and faster convergence, extracts representations from CT renderings and laparoscopic frames, while an RL policy head learns to choose rigid transformations along six degrees of freedom and to decide when to stop the iteration. Experiments on a public laparoscopic dataset demonstrated that our method achieved an average target registration error (TRE) of 15.70 mm, comparable to supervised approaches with optimization, while achieving faster convergence. The proposed RL-based formulation enables automated, efficient iterative registration without manually tuned step sizes or stopping criteria. This discrete framework provides a practical foundation for future continuous-action and deformable registration models in surgical AR applications.