🤖 AI Summary
This work addresses the severe visual degradation in underwater environments, which compromises the localization accuracy and dense reconstruction quality of conventional visual SLAM systems. To overcome these limitations, we propose a multimodal underwater SLAM framework that tightly fuses stereo cameras, an IMU, and a 3D sonar sensor. Our approach introduces an online coarse-to-fine extrinsic calibration method between the sonar and camera, along with a photometric rendering strategy for sonar point clouds, enabling effective integration of visual and sonar data. The system delivers real-time, high-precision six-degree-of-freedom localization and high-fidelity dense 3D reconstruction. Experimental results in both controlled tank and open-lake environments demonstrate superior robustness and accuracy compared to existing underwater and visual SLAM methods, achieving reconstruction quality comparable to offline approaches while maintaining real-time performance.
📝 Abstract
Visual challenges in underwater environments significantly hinder the accuracy of vision-based localisation and the high-fidelity dense reconstruction. In this paper, we propose VISO, a robust underwater SLAM system that fuses a stereo camera, an inertial measurement unit (IMU), and a 3D sonar to achieve accurate 6-DoF localisation and enable efficient dense 3D reconstruction with high photometric fidelity. We introduce a coarse-to-fine online calibration approach for extrinsic parameters estimation between the 3D sonar and the camera. Additionally, a photometric rendering strategy is proposed for the 3D sonar point cloud to enrich the sonar map with visual information. Extensive experiments in a laboratory tank and an open lake demonstrate that VISO surpasses current state-of-the-art underwater and visual-based SLAM algorithms in terms of localisation robustness and accuracy, while also exhibiting real-time dense 3D reconstruction performance comparable to the offline dense mapping method.