First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Camouflaged object detection (COD) is highly challenging due to the extreme visual similarity between objects and their backgrounds. Existing methods rely heavily on large-scale training data and substantial computational resources, while foundation models (e.g., SAM) offer strong generalization but require meticulous fine-tuning and high-quality handcrafted prompts. This paper proposes RAG-SEG, a training-free two-stage framework: (1) an unsupervised clustering-based retrieval library generates pseudo-labels and coarse masks via feature retrieval; (2) retrieved prompts are fed into SAM2 for end-to-end precise segmentation. To our knowledge, this is the first work to introduce retrieval-augmented generation (RAG) into COD, eliminating both manual prompt engineering and model fine-tuning. RAG-SEG achieves competitive or superior performance against state-of-the-art methods across multiple standard COD benchmarks. All experiments were conducted on a single consumer laptop, demonstrating exceptional efficiency, deployability, and practicality.

Technology Category

Application Category

📝 Abstract

Camouflaged object detection (COD) poses a significant challenge in computer vision due to the high similarity between objects and their backgrounds. Existing approaches often rely on heavy training and large computational resources. While foundation models such as the Segment Anything Model (SAM) offer strong generalization, they still struggle to handle COD tasks without fine-tuning and require high-quality prompts to yield good performance. However, generating such prompts manually is costly and inefficient. To address these challenges, we propose extbf{First RAG, Second SEG (RAG-SEG)}, a training-free paradigm that decouples COD into two stages: Retrieval-Augmented Generation (RAG) for generating coarse masks as prompts, followed by SAM-based segmentation (SEG) for refinement. RAG-SEG constructs a compact retrieval database via unsupervised clustering, enabling fast and effective feature retrieval. During inference, the retrieved features produce pseudo-labels that guide precise mask generation using SAM2. Our method eliminates the need for conventional training while maintaining competitive performance. Extensive experiments on benchmark COD datasets demonstrate that RAG-SEG performs on par with or surpasses state-of-the-art methods. Notably, all experiments are conducted on a extbf{personal laptop}, highlighting the computational efficiency and practicality of our approach. We present further analysis in the Appendix, covering limitations, salient object detection extension, and possible improvements.

Problem

Research questions and friction points this paper is trying to address.

Detecting camouflaged objects without training requirements

Reducing computational resources for segmentation tasks

Eliminating manual prompt generation for SAM models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free paradigm combining RAG and SAM

Unsupervised clustering for compact retrieval database

Generates pseudo-labels for SAM2 segmentation refinement

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)