Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing visual SLAM systems fail in turbid underwater environments due to severe light attenuation, backscatter, and low contrast, and further lack support for multi-camera configurations. To address these challenges, this paper proposes a multimodal tightly coupled SLAM framework tailored for work-class ROVs, integrating multi-view cameras, IMU, and forward-looking sonar. Methodologically, it introduces: (i) geometric visual-inertial odometry (VIO) tightly coupled with sonar registration via joint optimization; (ii) cross-modal calibration unifying optical, inertial, and sonar coordinate frames; (iii) deep learning–driven robust feature extraction resilient to underwater degradation; and (iv) real-time semantic segmentation–guided 3D reconstruction. Evaluated in the Trondheim Fjord, the system achieves >15 Hz real-time pose estimation and centimeter-level reconstruction accuracy—substantially outperforming monocular and stereo baselines. It is the first underwater SLAM framework to support arbitrary multi-camera topologies and semantic-enhanced mapping.

Technology Category

Application Category

📝 Abstract

Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) demand robust spatial perception capabilities, including Simultaneous Localization and Mapping (SLAM), to support both remote and autonomous tasks. Vision-based systems have been integral to these advancements, capturing rich color and texture at low cost while enabling semantic scene understanding. However, underwater conditions -- such as light attenuation, backscatter, and low contrast -- often degrade image quality to the point where traditional vision-based SLAM pipelines fail. Moreover, these pipelines typically rely on monocular or stereo inputs, limiting their scalability to the multi-camera configurations common on many vehicles. To address these issues, we propose to leverage multi-modal sensing that fuses data from multiple sensors-including cameras, inertial measurement units (IMUs), and acoustic devices-to enhance situational awareness and enable robust, real-time SLAM. We explore both geometric and learning-based techniques along with semantic analysis, and conduct experiments on the data collected from a work-class ROV during several field deployments in the Trondheim Fjord. Through our experimental results, we demonstrate the feasibility of real-time reliable state estimation and high-quality 3D reconstructions in visually challenging underwater conditions. We also discuss system constraints and identify open research questions, such as sensor calibration, limitations with learning-based methods, that merit further exploration to advance large-scale underwater operations.

Problem

Research questions and friction points this paper is trying to address.

Improving underwater robotics spatial perception in challenging conditions

Addressing vision-based SLAM failures due to poor underwater image quality

Enhancing situational awareness via multi-modal sensor fusion for robust SLAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal sensing fusion for robust SLAM

Combining cameras, IMUs, and acoustic devices

Real-time state estimation in challenging conditions

🔎 Similar Papers

SeePerSea: Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles