🤖 AI Summary
Existing salient object detection (SOD) and camouflaged object detection (COD) models suffer from task fragmentation and mutually exclusive dataset annotations (i.e., either salient or camouflaged only), rendering them inadequate for real-world scenarios where both object types may coexist, appear independently, or be absent simultaneously. To address this, we propose the first unified framework for joint SOD and COD. We introduce USCOD—the first unconstrained, jointly annotated benchmark—and CS12K, a large-scale dataset encompassing diverse co-occurrence patterns. We design USCNet, an attribute-decoupled network incorporating an Adaptive Pyramid Guidance (APG) module to explicitly model inter-task relationships. Furthermore, we propose the Confusion-Sensitive Camouflage Score (CSCS), a novel evaluation metric quantifying task ambiguity. Our method achieves state-of-the-art performance on USCOD, and CSCS analysis confirms its effectiveness in mitigating task confusion. Code and datasets will be publicly released.
📝 Abstract
Visual Salient Object Detection (SOD) and Camouflaged Object Detection (COD) are two interrelated yet distinct tasks. Both tasks model the human visual system's ability to perceive the presence of objects. The traditional SOD datasets and methods are designed for scenes where only salient objects are present, similarly, COD datasets and methods are designed for scenes where only camouflaged objects are present. However, scenes where both salient and camouflaged objects coexist, or where neither is present, are not considered. This simplifies the existing research on SOD and COD. In this paper, to explore a more generalized approach to SOD and COD, we introduce a benchmark called Unconstrained Salient and Camouflaged Object Detection (USCOD), which supports the simultaneous detection of salient and camouflaged objects in unconstrained scenes, regardless of their presence. Towards this, we construct a large-scale dataset, CS12K, that encompasses a variety of scenes, including four distinct types: only salient objects, only camouflaged objects, both, and neither. In our benchmark experiments, we identify a major challenge in USCOD: distinguishing between salient and camouflaged objects within the same scene. To address this challenge, we propose USCNet, a baseline model for USCOD that decouples the learning of attribute distinction from mask reconstruction. The model incorporates an APG module, which learns both sample-generic and sample-specific features to enhance the attribute differentiation between salient and camouflaged objects. Furthermore, to evaluate models' ability to distinguish between salient and camouflaged objects, we design a metric called Camouflage-Saliency Confusion Score (CSCS). The proposed method achieves state-of-the-art performance on the newly introduced USCOD task. The code and dataset will be publicly available.