🤖 AI Summary
To address the issues of model complexity, poor robustness, and insufficient real-time performance in video camouflage object detection (VCOD), this paper proposes GreenVCOD—a lightweight and efficient solution. Methodologically, it introduces the first “green” VCOD paradigm: instead of relying on computationally expensive optical flow or explicit motion modeling, it designs a Long-/Short-Term Temporal Neighborhood (TN) module to jointly model spatiotemporal context at low computational cost, thereby enhancing inter-frame consistency and detection stability. Built upon a single-frame camouflage object detection (COD) backbone, GreenVCOD adopts a lightweight CNN architecture and incorporates only a minimal number of learnable parameters for adaptive spatiotemporal feature fusion. Evaluated on mainstream VCOD benchmarks, it achieves state-of-the-art performance—improving F-measure by 2.1–3.8%, reducing computational cost by 37%, and accelerating inference speed by 2.4×—demonstrating superior efficiency and accuracy, particularly suitable for resource-constrained edge devices.
📝 Abstract
Camouflaged object detection (COD) aims to distinguish hidden objects embedded in an environment highly similar to the object. Conventional video-based COD (VCOD) methods explicitly extract motion cues or employ complex deep learning networks to handle the temporal information, which is limited by high complexity and unstable performance. In this work, we propose a green VCOD method named GreenVCOD. Built upon a green ICOD method, GreenVCOD uses long- and short-term temporal neighborhoods (TN) to capture joint spatial/temporal context information for decision refinement. Experimental results show that GreenVCOD offers competitive performance compared to state-of-the-art VCOD benchmarks.