MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance bottleneck in cross-modal ship re-identification (ReID) caused by significant modality discrepancies between optical and SAR imagery, this paper proposes the MOS framework. First, a class-level modality alignment loss is introduced to explicitly enforce distribution consistency of intra-class ship features across modalities in the embedding space. Second, a denoising preprocessing module enhances the discriminability of SAR images. Third, high-fidelity cross-modal synthetic samples are generated via a Brownian bridge diffusion model to mitigate inter-modal data distribution shift. Finally, multi-source feature fusion is employed during inference to improve discriminative robustness. Evaluated on the HOSS dataset, MOS achieves state-of-the-art R1 accuracy under ALL-to-ALL, Optical-to-SAR, and SAR-to-Optical protocols—improving by 3.0%, 6.2%, and 16.4%, respectively, over prior methods—demonstrating significantly enhanced cross-modal matching capability under challenging maritime conditions.

Technology Category

Application Category

📝 Abstract
Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery has recently emerged as a critical yet underexplored task in maritime intelligence and surveillance. However, the substantial modality gap between optical and SAR images poses a major challenge for robust identification. To address this issue, we propose MOS, a novel framework designed to mitigate the optical-SAR modality gap and achieve modality-consistent feature learning for optical-SAR cross-modal ship ReID. MOS consists of two core components: (1) Modality-Consistent Representation Learning (MCRL) applies denoise SAR image procession and a class-wise modality alignment loss to align intra-identity feature distributions across modalities. (2) Cross-modal Data Generation and Feature fusion (CDGF) leverages a brownian bridge diffusion model to synthesize cross-modal samples, which are subsequently fused with original features during inference to enhance alignment and discriminability. Extensive experiments on the HOSS ReID dataset demonstrate that MOS significantly surpasses state-of-the-art methods across all evaluation protocols, achieving notable improvements of +3.0%, +6.2%, and +16.4% in R1 accuracy under the ALL to ALL, Optical to SAR, and SAR to Optical settings, respectively. The code and trained models will be released upon publication.
Problem

Research questions and friction points this paper is trying to address.

Mitigates optical-SAR modality gap for ship re-identification
Aligns intra-identity feature distributions across modalities
Synthesizes cross-modal samples to enhance feature alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-Consistent Representation Learning with alignment loss
Cross-modal Data Generation using Brownian bridge diffusion model
Feature fusion of synthetic and original samples for enhanced alignment
🔎 Similar Papers
No similar papers found.
Y
Yujian Zhao
School of Artificial Intelligence, Beihang University
H
Hankun Liu
School of Computer Science and Engineering, Beihang University
Guanglin Niu
Guanglin Niu
Assistant Professor, Beihang University
artificial intelligencenatural language processingknowledge graphdeep learningknowledge reasoning