🤖 AI Summary
This study addresses the challenge of achieving high-precision pose estimation and closed-loop control for flexible endoscopic continuum manipulators, which is hindered by hysteresis, compliance, and limited distal sensing. The work proposes the first fully markerless, sensor-free visual servoing framework that integrates stereo-vision-based 6D pose estimation with position-based control. The approach leverages a multi-feature fusion network incorporating segmentation masks, keypoints, heatmaps, and bounding boxes, enhanced by feedforward rendering residual refinement, photorealistic simulation training, and a self-supervised domain adaptation strategy. Evaluated on 1,000 real-world samples, the method achieves an average pose estimation error of 0.83 mm in translation and 2.76° in rotation. In closed-loop trajectory tracking, it yields errors of 2.07 mm and 7.41°, representing reductions of 85% and 59%, respectively, compared to open-loop control.
📝 Abstract
Continuum manipulators in flexible endoscopic surgical systems offer high dexterity for minimally invasive procedures; however, accurate pose estimation and closed-loop control remain challenging due to hysteresis, compliance, and limited distal sensing. Vision-based approaches reduce hardware complexity but are often constrained by limited geometric observability and high computational overhead, restricting real-time closed-loop applicability. This paper presents a unified framework for markerless stereo 6D pose estimation and position-based visual servoing of continuum manipulators. A photo-realistic simulation pipeline enables large-scale automatic training with pixel-accurate annotations. A stereo-aware multi-feature fusion network jointly exploits segmentation masks, keypoints, heatmaps, and bounding boxes to enhance geometric observability. To enforce geometric consistency without iterative optimization, a feed-forward rendering-based refinement module predicts residual pose corrections in a single pass. A self-supervised sim-to-real adaptation strategy further improves real-world performance using unlabeled data. Extensive real-world validation achieves a mean translation error of 0.83 mm and a mean rotation error of 2.76° across 1,000 samples. Markerless closed-loop visual servoing driven by the estimated pose attains accurate trajectory tracking with a mean translation error of 2.07 mm and a mean rotation error of 7.41°, corresponding to 85% and 59% reductions compared to open-loop control, together with high repeatability in repeated point-reaching tasks. To the best of our knowledge, this work presents the first fully markerless pose-estimation-driven position-based visual servoing framework for continuum manipulators, enabling precise closed-loop control without physical markers or embedded sensing.