🤖 AI Summary
To address the performance bottleneck in cross-modal ship re-identification (ReID) caused by significant modality discrepancies between optical and SAR imagery, this paper proposes the MOS framework. First, a class-level modality alignment loss is introduced to explicitly enforce distribution consistency of intra-class ship features across modalities in the embedding space. Second, a denoising preprocessing module enhances the discriminability of SAR images. Third, high-fidelity cross-modal synthetic samples are generated via a Brownian bridge diffusion model to mitigate inter-modal data distribution shift. Finally, multi-source feature fusion is employed during inference to improve discriminative robustness. Evaluated on the HOSS dataset, MOS achieves state-of-the-art R1 accuracy under ALL-to-ALL, Optical-to-SAR, and SAR-to-Optical protocols—improving by 3.0%, 6.2%, and 16.4%, respectively, over prior methods—demonstrating significantly enhanced cross-modal matching capability under challenging maritime conditions.
📝 Abstract
Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery has recently emerged as a critical yet underexplored task in maritime intelligence and surveillance. However, the substantial modality gap between optical and SAR images poses a major challenge for robust identification. To address this issue, we propose MOS, a novel framework designed to mitigate the optical-SAR modality gap and achieve modality-consistent feature learning for optical-SAR cross-modal ship ReID. MOS consists of two core components: (1) Modality-Consistent Representation Learning (MCRL) applies denoise SAR image procession and a class-wise modality alignment loss to align intra-identity feature distributions across modalities. (2) Cross-modal Data Generation and Feature fusion (CDGF) leverages a brownian bridge diffusion model to synthesize cross-modal samples, which are subsequently fused with original features during inference to enhance alignment and discriminability. Extensive experiments on the HOSS ReID dataset demonstrate that MOS significantly surpasses state-of-the-art methods across all evaluation protocols, achieving notable improvements of +3.0%, +6.2%, and +16.4% in R1 accuracy under the ALL to ALL, Optical to SAR, and SAR to Optical settings, respectively. The code and trained models will be released upon publication.