MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing VCOD datasets exhibit severe bias toward wildlife, limiting their applicability in safety-critical and medical domains. To address this, we introduce MSVCOD—the first large-scale, multi-domain video camouflage object detection dataset—covering diverse targets including humans, animals, vehicles, and medical instruments against complex, dynamic backgrounds. Methodologically, we propose a single-stream spatiotemporal feature fusion framework that eliminates the need for auxiliary motion modules, and develop a semi-automatic iterative annotation pipeline to enhance both labeling efficiency and accuracy. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on both established animal-centric VCOD benchmarks and the new MSVCOD dataset. To foster community advancement, we will publicly release the dataset, source code, and trained models—establishing a new benchmark and practical resource for VCOD research.

Technology Category

Application Category

📝 Abstract
Video Camouflaged Object Detection (VCOD) is a challenging task which aims to identify objects that seamlessly concealed within the background in videos. The dynamic properties of video enable detection of camouflaged objects through motion cues or varied perspectives. Previous VCOD datasets primarily contain animal objects, limiting the scope of research to wildlife scenarios. However, the applications of VCOD extend beyond wildlife and have significant implications in security, art, and medical fields. Addressing this problem, we construct a new large-scale multi-domain VCOD dataset MSVCOD. To achieve high-quality annotations, we design a semi-automatic iterative annotation pipeline that reduces costs while maintaining annotation accuracy. Our MSVCOD is the largest VCOD dataset to date, introducing multiple object categories including human, animal, medical, and vehicle objects for the first time, while also expanding background diversity across various environments. This expanded scope increases the practical applicability of the VCOD task in camouflaged object detection. Alongside this dataset, we introduce a one-steam video camouflage object detection model that performs both feature extraction and information fusion without additional motion feature fusion modules. Our framework achieves state-of-the-art results on the existing VCOD animal dataset and the proposed MSVCOD. The dataset and code will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Expands VCOD dataset beyond wildlife scenarios.
Introduces diverse object categories in VCOD.
Enhances VCOD applicability in multiple fields.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-automatic iterative annotation pipeline
One-stream video camouflage object detection
Multi-domain large-scale VCOD dataset MSVCOD
🔎 Similar Papers
No similar papers found.
Shuyong Gao
Shuyong Gao
Fudan University
Human Visual AttentionGenerative ModelWeakly Supervised Learning
Y
Yu'ang Feng
Fudan University, Shanghai, China
Qishan Wang
Qishan Wang
Fudan Univiersity
Anomaly detection
Lingyi Hong
Lingyi Hong
Fudan University
Computer Vision
X
Xinyu Zhou
Fudan University, Shanghai, China
L
Liu Fei
Keenon Robotics Co. Ltd, Shanghai, China
Y
Yan Wang
Fudan University, Shanghai, China
W
Wenqiang Zhang
Fudan University, Shanghai, China