Distribution-Specific Learning for Joint Salient and Camouflaged Object Detection

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the performance degradation in joint learning of salient object detection (SOD) and camouflaged object detection (COD), caused by conflicting task objectives. To this end, we propose SCJoint—a novel joint learning framework. Methodologically, it employs a fully shared encoder coupled with distribution-specific decoders, introduces a distribution-aware parameter decoupling mechanism—modeling feature means and variances—for the first time, and integrates saliency-guided hard example mining to enhance training efficiency and data quality. Key contributions include: (i) the first empirical validation of synergistic gains between SOD and COD; (ii) learnable decoder parameters that enable cross-task knowledge complementarity rather than interference; and (iii) JoNet, our instantiated model, achieving state-of-the-art or competitive performance on both tasks, with faster convergence and more balanced computational load.

Technology Category

Application Category

📝 Abstract
Salient object detection (SOD) and camouflaged object detection (COD) are two closely related but distinct computer vision tasks. Although both are class-agnostic segmentation tasks that map from RGB space to binary space, the former aims to identify the most salient objects in the image, while the latter focuses on detecting perfectly camouflaged objects that blend into the background in the image. These two tasks exhibit strong contradictory attributes. Previous works have mostly believed that joint learning of these two tasks would confuse the network, reducing its performance on both tasks. However, here we present an opposite perspective: with the correct approach to learning, the network can simultaneously possess the capability to find both salient and camouflaged objects, allowing both tasks to benefit from joint learning. We propose SCJoint, a joint learning scheme for SOD and COD tasks, assuming that the decoding processes of SOD and COD have different distribution characteristics. The key to our method is to learn the respective means and variances of the decoding processes for both tasks by inserting a minimal amount of task-specific learnable parameters within a fully shared network structure, thereby decoupling the contradictory attributes of the two tasks at a minimal cost. Furthermore, we propose a saliency-based sampling strategy (SBSS) to sample the training set of the SOD task to balance the training set sizes of the two tasks. In addition, SBSS improves the training set quality and shortens the training time. Based on the proposed SCJoint and SBSS, we train a powerful generalist network, named JoNet, which has the ability to simultaneously capture both ``salient" and ``camouflaged". Extensive experiments demonstrate the competitive performance and effectiveness of our proposed method. The code is available at https://github.com/linuxsino/JoNet.
Problem

Research questions and friction points this paper is trying to address.

Joint learning for salient and camouflaged object detection
Decoupling contradictory attributes in SOD and COD tasks
Balancing and improving training set quality with SBSS
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint learning scheme SCJoint for SOD and COD
Task-specific parameters in shared network
Saliency-based sampling strategy SBSS
🔎 Similar Papers
No similar papers found.
C
Chao Hao
School of Computing and Information Technology, Great Bay University, Dongguan 523000, China
Zitong Yu
Zitong Yu
U.S. Food and Drug Administration
Medical imagingDeep learningMachine learningImage reconstruction
X
Xin Liu
Computer Vision and Pattern Recognition Laboratory, School of Engineering Science, Lappeenranta-Lahti University of Technology LUT, Lappeenranta 53850, Finland
Y
Yuhao Wang
School of Computing and Information Technology, Great Bay University, Dongguan 523000, China
Weicheng Xie
Weicheng Xie
Associate Professor, Shenzhen University
Facial expression analysisDeep learningImage processing
Jingang Shi
Jingang Shi
Xi'an Jiaotong University
computer visionface analysisimage restorationphysiological signal analysis
H
Huanjing Yue
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
J
Jingyu Yang
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China