Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address insufficient environmental perception robustness of autonomous vessels under complex maritime conditions, this paper proposes a multimodal, multi-view deep fusion method to generate high-accuracy, all-weather bird’s-eye view (BEV) representations of the vessel’s surroundings. We introduce a novel cross-modal cross-attention Transformer architecture that unifies heterogeneous sensor inputs—including RGB, thermal infrared, sparse LiDAR, X-band radar, and electronic navigational charts (ENC). A multi-view feature alignment mechanism and end-to-end joint training strategy are incorporated to ensure geometric consistency and semantic richness in the BEV representation. Sea trials on a real autonomous vessel demonstrate that the method maintains stable perception performance under adverse weather and dynamic sea states. Quantitatively, navigation and localization error is reduced by 37%, significantly enhancing system robustness and operational safety.

Technology Category

Application Category

📝 Abstract

We propose a cross attention transformer based method for multimodal sensor fusion to build a birds eye view of a vessels surroundings supporting safer autonomous marine navigation. The model deeply fuses multiview RGB and long wave infrared images with sparse LiDAR point clouds. Training also integrates X band radar and electronic chart data to inform predictions. The resulting view provides a detailed reliable scene representation improving navigational accuracy and robustness. Real world sea trials confirm the methods effectiveness even in adverse weather and complex maritime settings.

Problem

Research questions and friction points this paper is trying to address.

Fuse multimodal sensors for marine navigation

Integrate RGB, infrared, LiDAR, radar, and chart data

Improve accuracy and robustness in adverse conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross attention transformer for multimodal sensor fusion

Deep fusion of RGB, infrared, and LiDAR data

Integration of radar and chart data in training

🔎 Similar Papers

SeePerSea: Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles