Beyond Weight Adaptation: Feature-Space Domain Injection for Cross-Modal Ship Re-Identification

πŸ“… 2025-12-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Cross-modal ship re-identification (CMS Re-ID) faces fundamental challenges including large inter-modal discrepancies and heavy reliance on large-scale paired data for pretraining. To address these, this work abandons conventional weight-space alignment paradigms and proposes a novel feature-space dynamic domain adaptation framework grounded in the BERT representation hypothesis, operating under frozen visual foundation models. Its core innovation is Domain Representation Injection (DRI) in feature space: a lightweight offset encoder and a context-aware modulator jointly enable parameter-free, adaptive feature reshaping via additive fusion at intermediate layers. The method achieves high parameter efficiency (only 1.54M/7.05M trainable parameters) and strong generalization. On the HOSS-ReID benchmark, it attains 57.9% and 60.5% mAPβ€”setting new state-of-the-art performance.

Technology Category

Application Category

πŸ“ Abstract
Cross-Modality Ship Re-Identification (CMS Re-ID) is critical for achieving all-day and all-weather maritime target tracking, yet it is fundamentally challenged by significant modality discrepancies. Mainstream solutions typically rely on explicit modality alignment strategies; however, this paradigm heavily depends on constructing large-scale paired datasets for pre-training. To address this, grounded in the Platonic Representation Hypothesis, we explore the potential of Vision Foundation Models (VFMs) in bridging modality gaps. Recognizing the suboptimal performance of existing generic Parameter-Efficient Fine-Tuning (PEFT) methods that operate within the weight space, particularly on limited-capacity models, we shift the optimization perspective to the feature space and propose a novel PEFT strategy termed Domain Representation Injection (DRI). Specifically, while keeping the VFM fully frozen to maximize the preservation of general knowledge, we design a lightweight, learnable Offset Encoder to extract domain-specific representations rich in modality and identity attributes from raw inputs. Guided by the contextual information of intermediate features at different layers, a Modulator adaptively transforms these representations. Subsequently, they are injected into the intermediate layers via additive fusion, dynamically reshaping the feature distribution to adapt to the downstream task without altering the VFM's pre-trained weights. Extensive experimental results demonstrate the superiority of our method, achieving State-of-the-Art (SOTA) performance with minimal trainable parameters. For instance, on the HOSS-ReID dataset, we attain 57.9% and 60.5% mAP using only 1.54M and 7.05M parameters, respectively. The code is available at https://github.com/TingfengXian/DRI.
Problem

Research questions and friction points this paper is trying to address.

Addresses modality gaps in cross-modal ship re-identification
Proposes feature-space domain injection to avoid weight adaptation
Enables adaptation with minimal trainable parameters on frozen models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen Vision Foundation Models with lightweight learnable encoder
Injects domain-specific representations into intermediate feature layers
Adaptively reshapes feature distribution without altering pre-trained weights
T
Tingfeng Xian
School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, Guangdong, China, and also with Key Laboratory of Big Data and Intelligent Robot, Ministry of Education, South China University of Technology, Guangzhou 510641, China
Wenlve Zhou
Wenlve Zhou
The South China University of Techonology
Artificial IntelligenceComputer Vision
Zhiheng Zhou
Zhiheng Zhou
Center for Mind and Brain, University of California, Davis
Z
Zhelin Li
School of Design, South China University of Technology, Guangzhou 510006, China