🤖 AI Summary
To address the challenge of deploying conventional fixed-viewpoint multi-view photometric stereo (MVPS) on mobile robotic platforms, this paper proposes the first incremental MVPS framework tailored for mobile systems. Methodologically, it integrates supervised learning—jointly predicting surface normals, depth, and per-pixel uncertainty—with per-view photometric stereo depth optimization and uncertainty-aware global geometric fusion, while incorporating real-time camera pose tracking. The framework enables online 3D reconstruction without camera calibration, using only ≤5 input images and accommodating unknown reflectance properties. Evaluated on the DiLiGenT benchmark, it achieves state-of-the-art accuracy in normal estimation, with inference speed accelerated by approximately two orders of magnitude. To our knowledge, this is the first approach enabling real-time, globally consistent 3D reconstruction under mobile settings while preserving high-frequency surface details.
📝 Abstract
Multi-View Photometric Stereo (MVPS) is a popular method for fine-detailed 3D acquisition of an object from images. Despite its outstanding results on diverse material objects, a typical MVPS experimental setup requires a well-calibrated light source and a monocular camera installed on an immovable base. This restricts the use of MVPS on a movable platform, limiting us from taking MVPS benefits in 3D acquisition for mobile robotics applications. To this end, we introduce a new mobile robotic system for MVPS. While the proposed system brings advantages, it introduces additional algorithmic challenges. Addressing them, in this paper, we further propose an incremental approach for mobile robotic MVPS. Our approach leverages a supervised learning setup to predict per-view surface normal, object depth, and per-pixel uncertainty in model-predicted results. A refined depth map per view is obtained by solving an MVPS-driven optimization problem proposed in this paper. Later, we fuse the refined depth map while tracking the camera pose w.r.t the reference frame to recover globally consistent object 3D geometry. Experimental results show the advantages of our robotic system and algorithm, featuring the local high-frequency surface detail recovery with globally consistent object shape. Our work is beyond any MVPS system yet presented, providing encouraging results on objects with unknown reflectance properties using fewer frames without a tiring calibration and installation process, enabling computationally efficient robotic automation approach to photogrammetry. The proposed approach is nearly 100 times computationally faster than the state-of-the-art MVPS methods such as [1, 2] while maintaining the similar results when tested on subjects taken from the benchmark DiLiGenT MV dataset [3].