π€ AI Summary
To address long-term maritime surveillance challenges arising from spatiotemporal misalignment, multi-scale targets, and dynamic changes in multimodal remote sensing data, this paper introduces MMRS-Shipβthe first synchronized multimodal remote sensing dataset specifically designed for moored vessels. It comprises five modalities: optical, synthetic aperture radar (SAR), panchromatic, multispectral, and near-infrared, all rigorously co-registered in both space and time with pixel-level alignment. A hierarchical annotation scheme is proposed, including instance-level polygon localization, fine-grained categories, unique vessel IDs, and change masks. The dataset contains 1,092 image groups and 38,838 annotated vessels, supporting five fundamental tasks: recognition, detection, cross-modal matching, change analysis, and tracking. Experimental results demonstrate that multimodal fusion methods significantly outperform unimodal baselines in vessel recognition and change detection. MMRS-Ship establishes a standardized benchmark and methodological framework for multimodal remote sensing interpretation of maritime objects.
π Abstract
Given the limitations of satellite orbits and imaging conditions, multi-modal remote sensing (RS) data is crucial in enabling long-term earth observation. However, maritime surveillance remains challenging due to the complexity of multi-scale targets and the dynamic environments. To bridge this critical gap, we propose a Synchronized Multi-modal Aligned Remote sensing Targets dataset for berthed ships analysis (SMART-Ship), containing spatiotemporal registered images with fine-grained annotation for maritime targets from five modalities: visible-light, synthetic aperture radar (SAR), panchromatic, multi-spectral, and near-infrared. Specifically, our dataset consists of 1092 multi-modal image sets, covering 38,838 ships. Each image set is acquired within one week and registered to ensure spatiotemporal consistency. Ship instances in each set are annotated with polygonal location information, fine-grained categories, instance-level identifiers, and change region masks, organized hierarchically to support diverse multi-modal RS tasks. Furthermore, we define standardized benchmarks on five fundamental tasks and comprehensively compare representative methods across the dataset. Thorough experiment evaluations validate that the proposed SMART-Ship dataset could support various multi-modal RS interpretation tasks and reveal the promising directions for further exploration.