π€ AI Summary
To address the limited generalization capability of existing models to unseen categories in open-vocabulary camouflaged object segmentation, this paper proposes a classifier-centric adaptive framework. We introduce a lightweight text adapter and a hierarchical asymmetric initialization strategy to enhance vision-language semantic alignment. Furthermore, we jointly optimize the classifierβs open-vocabulary reasoning ability via text-prompt-driven vision-language model (VLM) ensembling, adapter fine-tuning, and parameter initialization. On the OVCamo benchmark, our method achieves a cIoU of 0.493 (+0.05), a cSm of 0.658 (+0.079), and a cMAE of 0.239 (β0.097), significantly outperforming the OVCoser baseline. These results empirically validate that classifier enhancement plays a pivotal role in improving segmentation performance for unseen camouflage categories.
π Abstract
Open-vocabulary camouflaged object segmentation requires models to segment camouflaged objects of arbitrary categories unseen during training, placing extremely high demands on generalization capabilities. Through analysis of existing methods, it is observed that the classification component significantly affects overall segmentation performance. Accordingly, a classifier-centric adaptive framework is proposed to enhance segmentation performance by improving the classification component via a lightweight text adapter with a novel layered asymmetric initialization. Through the classification enhancement, the proposed method achieves substantial improvements in segmentation metrics compared to the OVCoser baseline on the OVCamo benchmark: cIoU increases from 0.443 to 0.493, cSm from 0.579 to 0.658, and cMAE reduces from 0.336 to 0.239. These results demonstrate that targeted classification enhancement provides an effective approach for advancing camouflaged object segmentation performance.