🤖 AI Summary
Monocular depth estimation (MDE) in surgical robotics suffers from high uncertainty and low accuracy under challenging conditions—including textureless regions, specular reflections, and occlusions.
Method: This paper proposes an active perception framework that fuses RGB imagery with sparse tactile ranging. It introduces, for the first time, an uncertainty quantification mechanism based on ensemble model variance; employs Stein variational gradient descent (SVGD) to optimize tactile contact selection—thereby avoiding mode collapse; and designs an uncertainty-gradient-driven, information-maximizing active probing strategy.
Results: Evaluated on both airway obstruction surgery simulation and a physical robotic platform, the method significantly improves depth estimation accuracy over state-of-the-art baselines while minimizing the required number of tactile contacts. It achieves robust, low-invasive, proprioceptively enhanced depth estimation—demonstrating substantial improvements in reliability and clinical practicality.
📝 Abstract
Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE. Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble's variance and measure the gradient of the uncertainty with respect to candidate measurement locations. To prevent mode collapse while selecting maximally informative locations to propriocept (touch), we leverage Stein Variational Gradient Descent (SVGD) over this gradient map. We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements.