${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the two key bottlenecks in weakly supervised camouflaged object detection (WSCOD)—unreliable pseudo-labels and scribble annotation bias—this paper proposes a two-stage framework. In Stage I, a multi-agent debate mechanism and adaptive entropy-driven point sampling are introduced to significantly enhance the task-specificity and reliability of SAM-generated pseudo-masks. In Stage II, a frequency-aware progressive debiasing network (FADeNet) is developed, leveraging DCT/DWT decomposition, multi-level frequency feature fusion, and dynamic region re-weighted supervision to jointly model global structures and local details while explicitly correcting scribble bias. This work is the first to integrate multi-agent debate-based pseudo-label generation and frequency-domain debiasing into WSCOD. Evaluated on CAMO, COD10K, and NC4K, it achieves an mIoU of 62.3%, outperforming existing weakly supervised methods and narrowing the gap with fully supervised SOTA to within 3.5%.

Technology Category

Application Category

📝 Abstract
Weakly-Supervised Camouflaged Object Detection (WSCOD) aims to locate and segment objects that are visually concealed within their surrounding scenes, relying solely on sparse supervision such as scribble annotations. Despite recent progress, existing WSCOD methods still lag far behind fully supervised ones due to two major limitations: (1) the pseudo masks generated by general-purpose segmentation models (e.g., SAM) and filtered via rules are often unreliable, as these models lack the task-specific semantic understanding required for effective pseudo labeling in COD; and (2) the neglect of inherent annotation bias in scribbles, which hinders the model from capturing the global structure of camouflaged objects. To overcome these challenges, we propose ${D}^{3}$ETOR, a two-stage WSCOD framework consisting of Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing. In the first stage, we introduce an adaptive entropy-driven point sampling method and a multi-agent debate mechanism to enhance the capability of SAM for COD, improving the interpretability and precision of pseudo masks. In the second stage, we design FADeNet, which progressively fuses multi-level frequency-aware features to balance global semantic understanding with local detail modeling, while dynamically reweighting supervision strength across regions to alleviate scribble bias. By jointly exploiting the supervision signals from both the pseudo masks and scribble semantics, ${D}^{3}$ETOR significantly narrows the gap between weakly and fully supervised COD, achieving state-of-the-art performance on multiple benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Enhance pseudo mask reliability for camouflaged object detection
Address scribble annotation bias in weakly-supervised segmentation
Bridge performance gap between weak and full supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Debate-enhanced pseudo labeling improves SAM's camouflage detection precision
Frequency-aware progressive debiasing balances global and local feature fusion
Dynamic reweighting of supervision alleviates scribble annotation bias
🔎 Similar Papers
No similar papers found.
J
Jiawei Ge
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
J
Jiuxin Cao
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
X
Xinyi Li
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
Xuelin Zhu
Xuelin Zhu
Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University, Hong Kong, China
C
Chang Liu
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
B
Bo Liu
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
C
Chen Feng
School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, U.K.
Ioannis Patras
Ioannis Patras
Professor, Queen Mary, University of London
Computer VisionMachine LearningArtificial IntelligenceFace and gesture recognitionMultimedia Analysis