Mamba-based Efficient Spatio-Frequency Motion Perception for Video Camouflaged Object Detection

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

259K/year

🤖 AI Summary

Existing video camouflage object detection (VCOD) methods rely on spatial appearance features to perceive motion cues; however, low discriminability of color and texture features—due to high foreground-background similarity—severely limits detection accuracy and completeness. To address this, we propose Vcamba, a Mamba-based spatiotemporal-frequency motion-aware model. Vcamba introduces a novel frequency-domain sequence scanning strategy and a dual-domain (spatial/frequency) long-range motion perception mechanism. It integrates three key components: Adaptive Frequency-domain Enhancement (AFE), Receptive-Field Visual State Space (RFVSS), and Spatial-Frequency Motion Fusion (SFMF), enabling efficient dynamic modeling of camouflaged motion. Evaluated on two mainstream benchmarks, Vcamba achieves state-of-the-art performance across all six metrics while reducing computational overhead. The method significantly enhances motion cue extraction capability and detection robustness in complex camouflage scenarios.

Technology Category

Application Category

📝 Abstract

Existing video camouflaged object detection (VCOD) methods primarily rely on spatial appearance features to perceive motion cues for breaking camouflage. However, the high similarity between foreground and background in VCOD results in limited discriminability of spatial appearance features (e.g., color and texture), restricting detection accuracy and completeness. Recent studies demonstrate that frequency features can not only enhance feature representation to compensate for appearance limitations but also perceive motion through dynamic variations in frequency energy. Furthermore, the emerging state space model called Mamba, enables efficient perception of motion cues in frame sequences due to its linear-time long-sequence modeling capability. Motivated by this, we propose a novel visual camouflage Mamba (Vcamba) based on spatio-frequency motion perception that integrates frequency and spatial features for efficient and accurate VCOD. Specifically, we propose a receptive field visual state space (RFVSS) module to extract multi-scale spatial features after sequence modeling. For frequency learning, we introduce an adaptive frequency component enhancement (AFE) module with a novel frequency-domain sequential scanning strategy to maintain semantic consistency. Then we propose a space-based long-range motion perception (SLMP) module and a frequency-based long-range motion perception (FLMP) module to model spatio-temporal and frequency-temporal sequences in spatial and frequency phase domains. Finally, the space and frequency motion fusion module (SFMF) integrates dual-domain features for unified motion representation. Experimental results show that our Vcamba outperforms state-of-the-art methods across 6 evaluation metrics on 2 datasets with lower computation cost, confirming the superiority of Vcamba. Our code is available at: https://github.com/BoydeLi/Vcamba.

Problem

Research questions and friction points this paper is trying to address.

Enhance video camouflaged object detection using spatio-frequency motion perception

Overcome spatial feature limitations with frequency and Mamba-based modeling

Integrate spatial and frequency domains for efficient, accurate motion representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based spatio-frequency motion perception

Adaptive frequency component enhancement module

Space and frequency motion fusion module

🔎 Similar Papers

Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection