🤖 AI Summary
Camouflaged object detection (COD) is highly challenging due to extreme visual similarity between objects and backgrounds in color, texture, and structure, leading to six core difficulties: intrinsic similarity, ambiguous boundaries, scale variation, complex environments, context dependency, and ambiguity between salient and camouflaged regions. To address these, we propose a dual-path decoding architecture that jointly optimizes edge recovery and contextual localization. Specifically, we design a gradient-initialized edge enhancement module to improve boundary sensitivity; introduce an image-level contextual guidance mechanism to strengthen semantic discrimination; and employ spatial-gated attention to fuse early-stage features with contrastive learning representations. Our method achieves state-of-the-art S-measure scores of 0.898, 0.904, and 0.913 on COD10K, CAMO, and NC4K, respectively—demonstrating both superior accuracy and computational efficiency. The code and pretrained models are publicly available.
📝 Abstract
Camouflaged object detection identifies objects that blend seamlessly with their surroundings through similar colors, textures, and patterns. This task challenges both traditional segmentation methods and modern foundation models, which fail dramatically on camouflaged objects. We identify six fundamental challenges in COD: Intrinsic Similarity, Edge Disruption, Extreme Scale Variation, Environmental Complexities, Contextual Dependencies, and Salient-Camouflaged Object Disambiguation. These challenges frequently co-occur and compound the difficulty of detection, requiring comprehensive architectural solutions. We propose C3Net, which addresses all challenges through a specialized dual-pathway decoder architecture. The Edge Refinement Pathway employs gradient-initialized Edge Enhancement Modules to recover precise boundaries from early features. The Contextual Localization Pathway utilizes our novel Image-based Context Guidance mechanism to achieve intrinsic saliency suppression without external models. An Attentive Fusion Module synergistically combines the two pathways via spatial gating. C3Net achieves state-of-the-art performance with S-measures of 0.898 on COD10K, 0.904 on CAMO, and 0.913 on NC4K, while maintaining efficient processing. C3Net demonstrates that complex, multifaceted detection challenges require architectural innovation, with specialized components working synergistically to achieve comprehensive coverage beyond isolated improvements. Code, model weights, and results are available at https://github.com/Baber-Jan/C3Net.