🤖 AI Summary
To address limited situational awareness in underwater ROV teleoperation caused by first-person (egocentric) vision, this paper proposes a geometry-driven, closed-form ego-to-exocentric view synthesis method that requires no training data and is cross-scene generalizable—enabling plug-and-play integration with existing monocular SLAM-based ROV systems. The approach synergistically combines real-time monocular SLAM pose estimation with lightweight 3D geometric modeling to reconstruct dynamic exocentric views under low-light conditions, supporting both 2-DOF indoor environments and 6-DOF underwater cave scenes. Subjective evaluations involving 15 operators demonstrate significant improvements in control accuracy and spatial situational understanding. Notably, it enables, for the first time, cave survey-line-guided navigation leveraging dynamically synthesized exocentric viewpoints. The core innovation lies in a zero-shot, geometry-prior-driven real-time view synthesis framework, overcoming the data- and scene-specific dependencies inherent in conventional learning-based methods.
📝 Abstract
Underwater ROVs (Remotely Operated Vehicles) are unmanned submersible vehicles designed for exploring and operating in the depths of the ocean. Despite using high-end cameras, typical teleoperation engines based on first-person (egocentric) views limit a surface operator's ability to maneuver the ROV in complex deep-water missions. In this paper, we present an interactive teleoperation interface that enhances the operational capabilities via increased situational awareness. This is accomplished by (i) offering on-demand"third"-person (exocentric) visuals from past egocentric views, and (ii) facilitating enhanced peripheral information with augmented ROV pose information in real-time. We achieve this by integrating a 3D geometry-based Ego-to-Exo view synthesis algorithm into a monocular SLAM system for accurate trajectory estimation. The proposed closed-form solution only uses past egocentric views from the ROV and a SLAM backbone for pose estimation, which makes it portable to existing ROV platforms. Unlike data-driven solutions, it is invariant to applications and waterbody-specific scenes. We validate the geometric accuracy of the proposed framework through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. A subjective evaluation on 15 human teleoperators further confirms the effectiveness of the integrated features for improved teleoperation. We demonstrate the benefits of dynamic Ego-to-Exo view generation and real-time pose rendering for remote ROV teleoperation by following navigation guides such as cavelines inside underwater caves. This new way of interactive ROV teleoperation opens up promising opportunities for future research in subsea telerobotics.