Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of visible-light cues and the performance limitations of unimodal methods in camouflaged object segmentation (COS), this paper proposes UniCOS—the first framework to employ state-space models (SSMs) for dynamic cross-modal feature modeling and fusion. Its key contributions are: (1) a state-space-driven cross-modal fusion mechanism with feedback architecture; (2) the UniLearner module, which leverages non-COS multimodal data (e.g., infrared, depth) to synthesize pseudo-modal content and establish semantic correspondences, enabling label-free knowledge transfer; and (3) joint semantic alignment learning and multimodal knowledge distillation. Evaluated on both real and pseudo-multimodal COS benchmarks, UniCOS achieves significant improvements over state-of-the-art methods—gaining +4.2% mIoU using only off-the-shelf non-COS multimodal data—demonstrating its effectiveness in bridging the modality gap without requiring COS-specific annotations.

Technology Category

Application Category

📝 Abstract
Camouflaged Object Segmentation (COS) remains a challenging problem due to the subtle visual differences between camouflaged objects and backgrounds. Owing to the exceedingly limited visual cues available from visible spectrum, previous RGB single-modality approaches often struggle to achieve satisfactory results, prompting the exploration of multimodal data to enhance detection accuracy. In this work, we present UniCOS, a novel framework that effectively leverages diverse data modalities to improve segmentation performance. UniCOS comprises two key components: a multimodal segmentor, UniSEG, and a cross-modal knowledge learning module, UniLearner. UniSEG employs a state space fusion mechanism to integrate cross-modal features within a unified state space, enhancing contextual understanding and improving robustness to integration of heterogeneous data. Additionally, it includes a fusion-feedback mechanism that facilitate feature extraction. UniLearner exploits multimodal data unrelated to the COS task to improve the segmentation ability of the COS models by generating pseudo-modal content and cross-modal semantic associations. Extensive experiments demonstrate that UniSEG outperforms existing Multimodal COS (MCOS) segmentors, regardless of whether real or pseudo-multimodal COS data is available. Moreover, in scenarios where multimodal COS data is unavailable but multimodal non-COS data is accessible, UniLearner effectively exploits these data to enhance segmentation performance. Our code will be made publicly available on href{https://github.com/cnyvfang/UniCOS}{GitHub}.
Problem

Research questions and friction points this paper is trying to address.

Enhance camouflaged object segmentation accuracy
Integrate diverse data modalities effectively
Improve segmentation with pseudo-modal content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages multimodal data
Unifies state space features
Generates pseudo-modal content
🔎 Similar Papers
No similar papers found.
Chengyu Fang
Chengyu Fang
Tsinghua University & Alibaba DAMO Academy
Computer VisionMedical AIEfficient MLLM
Chunming He
Chunming He
Duke University | Tsinghua University
Computer VisionMachine LearningBiomedical Image Analysis
Longxiang Tang
Longxiang Tang
Tsinghua University
Computer Vision
Y
Yuelin Zhang
MAE, The Chinese University of Hong Kong, Hongkong, China
C
Chenyang Zhu
SIGS, Tsinghua University, Shenzhen, China
Y
Yuqi Shen
SIGS, Tsinghua University, Shenzhen, China
Chubin Chen
Chubin Chen
Tsinghua University
Generative AI
G
Guoxia Xu
SCIE, Nanjing University of Posts and Telecommunications, Nanjing, China
Xiu Li
Xiu Li
Bytedance Seed
Computer VisionComputer Graphics3D Vision