🤖 AI Summary
To address insufficient environmental perception robustness of autonomous vessels under complex maritime conditions, this paper proposes a multimodal, multi-view deep fusion method to generate high-accuracy, all-weather bird’s-eye view (BEV) representations of the vessel’s surroundings. We introduce a novel cross-modal cross-attention Transformer architecture that unifies heterogeneous sensor inputs—including RGB, thermal infrared, sparse LiDAR, X-band radar, and electronic navigational charts (ENC). A multi-view feature alignment mechanism and end-to-end joint training strategy are incorporated to ensure geometric consistency and semantic richness in the BEV representation. Sea trials on a real autonomous vessel demonstrate that the method maintains stable perception performance under adverse weather and dynamic sea states. Quantitatively, navigation and localization error is reduced by 37%, significantly enhancing system robustness and operational safety.
📝 Abstract
We propose a cross attention transformer based method for multimodal sensor fusion to build a birds eye view of a vessels surroundings supporting safer autonomous marine navigation. The model deeply fuses multiview RGB and long wave infrared images with sparse LiDAR point clouds. Training also integrates X band radar and electronic chart data to inform predictions. The resulting view provides a detailed reliable scene representation improving navigational accuracy and robustness. Real world sea trials confirm the methods effectiveness even in adverse weather and complex maritime settings.