Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation

📅 2025-05-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient environmental perception robustness of autonomous vessels under complex maritime conditions, this paper proposes a multimodal, multi-view deep fusion method to generate high-accuracy, all-weather bird’s-eye view (BEV) representations of the vessel’s surroundings. We introduce a novel cross-modal cross-attention Transformer architecture that unifies heterogeneous sensor inputs—including RGB, thermal infrared, sparse LiDAR, X-band radar, and electronic navigational charts (ENC). A multi-view feature alignment mechanism and end-to-end joint training strategy are incorporated to ensure geometric consistency and semantic richness in the BEV representation. Sea trials on a real autonomous vessel demonstrate that the method maintains stable perception performance under adverse weather and dynamic sea states. Quantitatively, navigation and localization error is reduced by 37%, significantly enhancing system robustness and operational safety.

Technology Category

Application Category

📝 Abstract
We propose a cross attention transformer based method for multimodal sensor fusion to build a birds eye view of a vessels surroundings supporting safer autonomous marine navigation. The model deeply fuses multiview RGB and long wave infrared images with sparse LiDAR point clouds. Training also integrates X band radar and electronic chart data to inform predictions. The resulting view provides a detailed reliable scene representation improving navigational accuracy and robustness. Real world sea trials confirm the methods effectiveness even in adverse weather and complex maritime settings.
Problem

Research questions and friction points this paper is trying to address.

Fuse multimodal sensors for marine navigation
Integrate RGB, infrared, LiDAR, radar, and chart data
Improve accuracy and robustness in adverse conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross attention transformer for multimodal sensor fusion
Deep fusion of RGB, infrared, and LiDAR data
Integration of radar and chart data in training