First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Camouflaged object detection (COD) is highly challenging due to the extreme visual similarity between objects and their backgrounds. Existing methods rely heavily on large-scale training data and substantial computational resources, while foundation models (e.g., SAM) offer strong generalization but require meticulous fine-tuning and high-quality handcrafted prompts. This paper proposes RAG-SEG, a training-free two-stage framework: (1) an unsupervised clustering-based retrieval library generates pseudo-labels and coarse masks via feature retrieval; (2) retrieved prompts are fed into SAM2 for end-to-end precise segmentation. To our knowledge, this is the first work to introduce retrieval-augmented generation (RAG) into COD, eliminating both manual prompt engineering and model fine-tuning. RAG-SEG achieves competitive or superior performance against state-of-the-art methods across multiple standard COD benchmarks. All experiments were conducted on a single consumer laptop, demonstrating exceptional efficiency, deployability, and practicality.

Technology Category

Application Category

📝 Abstract
Camouflaged object detection (COD) poses a significant challenge in computer vision due to the high similarity between objects and their backgrounds. Existing approaches often rely on heavy training and large computational resources. While foundation models such as the Segment Anything Model (SAM) offer strong generalization, they still struggle to handle COD tasks without fine-tuning and require high-quality prompts to yield good performance. However, generating such prompts manually is costly and inefficient. To address these challenges, we propose extbf{First RAG, Second SEG (RAG-SEG)}, a training-free paradigm that decouples COD into two stages: Retrieval-Augmented Generation (RAG) for generating coarse masks as prompts, followed by SAM-based segmentation (SEG) for refinement. RAG-SEG constructs a compact retrieval database via unsupervised clustering, enabling fast and effective feature retrieval. During inference, the retrieved features produce pseudo-labels that guide precise mask generation using SAM2. Our method eliminates the need for conventional training while maintaining competitive performance. Extensive experiments on benchmark COD datasets demonstrate that RAG-SEG performs on par with or surpasses state-of-the-art methods. Notably, all experiments are conducted on a extbf{personal laptop}, highlighting the computational efficiency and practicality of our approach. We present further analysis in the Appendix, covering limitations, salient object detection extension, and possible improvements.
Problem

Research questions and friction points this paper is trying to address.

Detecting camouflaged objects without training requirements
Reducing computational resources for segmentation tasks
Eliminating manual prompt generation for SAM models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free paradigm combining RAG and SAM
Unsupervised clustering for compact retrieval database
Generates pseudo-labels for SAM2 segmentation refinement
🔎 Similar Papers
No similar papers found.
W
Wutao Liu
Nanjing University of Aeronautics and Astronautics
Y
YiDan Wang
Nanjing University of Aeronautics and Astronautics
Pan Gao
Pan Gao
Professor, Nanjing University of Aeronautics and Astronautics;
Image/Video/Point cloudsdeep learningMultimedia