🤖 AI Summary
This paper formally defines Concealed Dense Prediction (CDP)—a novel vision task addressing fine-grained, dense understanding of objects that are visually imperceptible due to extreme camouflage and background fusion—and identifies its core challenges. To tackle CDP, we propose a Masked Adversarial Classification framework, construct CvpINST—the first multimodal instruction dataset tailored for CDP—and introduce CvpAgent, an隐匿-perception agent integrating fine-grained representation learning, prior-knowledge injection, auxiliary reasoning, and visual agent architecture. We conduct a unified evaluation of 25 state-of-the-art methods across 12 concealed-scene benchmarks, establishing the first comprehensive CDP benchmark. Furthermore, we distill six frontier research directions, laying foundational theory and technical paradigms for camouflaged visual understanding in the era of large models.
📝 Abstract
Deep learning is developing rapidly and handling common computer vision tasks well. It is time to pay attention to more complex vision tasks, as model size, knowledge, and reasoning capabilities continue to improve. In this paper, we introduce and review a family of complex tasks, termed Concealed Dense Prediction (CDP), which has great value in agriculture, industry, etc. CDP's intrinsic trait is that the targets are concealed in their surroundings, thus fully perceiving them requires fine-grained representations, prior knowledge, auxiliary reasoning, etc. The contributions of this review are three-fold: (i) We introduce the scope, characteristics, and challenges specific to CDP tasks and emphasize their essential differences from generic vision tasks. (ii) We develop a taxonomy based on concealment counteracting to summarize deep learning efforts in CDP through experiments on three tasks. We compare 25 state-of-the-art methods across 12 widely used concealed datasets. (iii) We discuss the potential applications of CDP in the large model era and summarize 6 potential research directions. We offer perspectives for the future development of CDP by constructing a large-scale multimodal instruction fine-tuning dataset, CvpINST, and a concealed visual perception agent, CvpAgent.