🤖 AI Summary
Semi-supervised camouflaged object detection suffers from severe pseudo-label bias, error accumulation during propagation, and high computational overhead and poor scalability due to multi-network architectures.
Method: This paper proposes a lightweight, single-model self-training framework built upon the Segment Anything Model (SAM). It integrates hybrid prompt learning with domain-adaptive prompt transformation to dynamically select high-confidence pseudo-labels and inject domain-specific prior knowledge—thereby mitigating prediction bias and cumulative errors. Crucially, it eliminates the need for teacher-student paradigms or multi-branch designs, relying solely on iterative optimization of a single model for efficient label utilization.
Contribution/Results: Under only 1% labeled data, our method surpasses existing semi-supervised approaches and matches the performance of fully supervised models, while significantly reducing computational cost and enhancing deployment scalability.
📝 Abstract
Semi-supervised Camouflaged Object Detection (SSCOD) aims to reduce reliance on costly pixel-level annotations by leveraging limited annotated data and abundant unlabeled data. However, existing SSCOD methods based on Teacher-Student frameworks suffer from severe prediction bias and error propagation under scarce supervision, while their multi-network architectures incur high computational overhead and limited scalability. To overcome these limitations, we propose ST-SAM, a highly annotation-efficient yet concise framework that breaks away from conventional SSCOD constraints. Specifically, ST-SAM employs Self-Training strategy that dynamically filters and expands high-confidence pseudo-labels to enhance a single-model architecture, thereby fundamentally circumventing inter-model prediction bias. Furthermore, by transforming pseudo-labels into hybrid prompts containing domain-specific knowledge, ST-SAM effectively harnesses the Segment Anything Model's potential for specialized tasks to mitigate error accumulation in self-training. Experiments on COD benchmark datasets demonstrate that ST-SAM achieves state-of-the-art performance with only 1% labeled data, outperforming existing SSCOD methods and even matching fully supervised methods. Remarkably, ST-SAM requires training only a single network, without relying on specific models or loss functions. This work establishes a new paradigm for annotation-efficient SSCOD. Codes will be available at https://github.com/hu-xh/ST-SAM.