DUViN: Diffusion-Based Underwater Visual Navigation via Knowledge-Transferred Depth Features

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual navigation in unknown underwater environments faces severe challenges—including scarce labeled visual data, significant domain shift between aerial and underwater scenes, and difficulty constructing accurate maps. Method: This paper proposes a diffusion-model-based end-to-end 4-DOF visual navigation approach enabling map-free obstacle avoidance and terrain following. It employs a two-stage training framework: (i) pretraining the navigation policy on abundant aerial datasets, and (ii) adapting it to the underwater domain via cross-domain knowledge transfer—leveraging the pretrained depth feature extractor—and modeling action distributions with a diffusion model to enhance robustness and generalization. Contribution/Results: Evaluated in both simulated and real underwater environments, the method substantially alleviates data scarcity, achieves stable, adaptive perception-driven navigation, and establishes a novel paradigm for resource-constrained autonomous underwater navigation.

Technology Category

Application Category

📝 Abstract
Autonomous underwater navigation remains a challenging problem due to limited sensing capabilities and the difficulty of constructing accurate maps in underwater environments. In this paper, we propose a Diffusion-based Underwater Visual Navigation policy via knowledge-transferred depth features, named DUViN, which enables vision-based end-to-end 4-DoF motion control for underwater vehicles in unknown environments. DUViN guides the vehicle to avoid obstacles and maintain a safe and perception awareness altitude relative to the terrain without relying on pre-built maps. To address the difficulty of collecting large-scale underwater navigation datasets, we propose a method that ensures robust generalization under domain shifts from in-air to underwater environments by leveraging depth features and introducing a novel model transfer strategy. Specifically, our training framework consists of two phases: we first train the diffusion-based visual navigation policy on in-air datasets using a pre-trained depth feature extractor. Secondly, we retrain the extractor on an underwater depth estimation task and integrate the adapted extractor into the trained navigation policy from the first step. Experiments in both simulated and real-world underwater environments demonstrate the effectiveness and generalization of our approach. The experimental videos are available at https://www.youtube.com/playlist?list=PLqt2s-RyCf1gfXJgFzKjmwIqYhrP4I-7Y.
Problem

Research questions and friction points this paper is trying to address.

Autonomous underwater navigation without pre-built maps
Vision-based motion control in unknown environments
Generalization from in-air to underwater domain shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based visual navigation policy
Knowledge-transferred depth features
Domain adaptation from air to water
🔎 Similar Papers
No similar papers found.