🤖 AI Summary
To address the performance limitations of underwater camouflaged object detection and tracking—stemming from the lack of large-scale, task-specific datasets—this paper introduces UW-COT220, the first large-scale multimodal benchmark for underwater camouflaged object tracking, and conducts a systematic evaluation of existing methods. Building upon this benchmark, we propose VL-SAM2, a vision-language collaborative tracking framework. VL-SAM2 is the first to adapt the video foundation model SAM2 to underwater tracking, demonstrating its superior robustness over SAM in complex underwater environments. It innovatively integrates text-guided mask refinement, cross-frame feature alignment, and vision-language prior modeling. Experiments show that VL-SAM2 achieves state-of-the-art performance on UW-COT220. Both the dataset and code are publicly released, establishing critical infrastructure for intelligent underwater perception research.
📝 Abstract
Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale datasets. However, existing tracking datasets are primarily focused on open-air scenarios, which greatly limits the development of object tracking in underwater environments. To bridge this gap, we take a step forward by proposing the first large-scale multimodal underwater camouflaged object tracking dataset, namely UW-COT220. Based on the proposed dataset, this paper first comprehensively evaluates current advanced visual object tracking methods and SAM- and SAM2-based trackers in challenging underwater environments. Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects. Furthermore, we propose a novel vision-language tracking framework called VL-SAM2, based on the video foundation model SAM2. Experimental results demonstrate that our VL-SAM2 achieves state-of-the-art performance on the UW-COT220 dataset. The dataset and codes can be accessible at color{magenta}{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}.