Towards Underwater Camouflaged Object Tracking: Benchmark and Baselines

📅 2024-09-25

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the performance limitations of underwater camouflaged object detection and tracking—stemming from the lack of large-scale, task-specific datasets—this paper introduces UW-COT220, the first large-scale multimodal benchmark for underwater camouflaged object tracking, and conducts a systematic evaluation of existing methods. Building upon this benchmark, we propose VL-SAM2, a vision-language collaborative tracking framework. VL-SAM2 is the first to adapt the video foundation model SAM2 to underwater tracking, demonstrating its superior robustness over SAM in complex underwater environments. It innovatively integrates text-guided mask refinement, cross-frame feature alignment, and vision-language prior modeling. Experiments show that VL-SAM2 achieves state-of-the-art performance on UW-COT220. Both the dataset and code are publicly released, establishing critical infrastructure for intelligent underwater perception research.

Technology Category

Application Category

📝 Abstract

Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale datasets. However, existing tracking datasets are primarily focused on open-air scenarios, which greatly limits the development of object tracking in underwater environments. To bridge this gap, we take a step forward by proposing the first large-scale multimodal underwater camouflaged object tracking dataset, namely UW-COT220. Based on the proposed dataset, this paper first comprehensively evaluates current advanced visual object tracking methods and SAM- and SAM2-based trackers in challenging underwater environments. Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects. Furthermore, we propose a novel vision-language tracking framework called VL-SAM2, based on the video foundation model SAM2. Experimental results demonstrate that our VL-SAM2 achieves state-of-the-art performance on the UW-COT220 dataset. The dataset and codes can be accessible at color{magenta}{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}.

Problem

Research questions and friction points this paper is trying to address.

Underwater Object Detection

Lack of Large Datasets

Performance Improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

UW-COT220 Dataset

SAM2 Optimization

VL-SAM2 Development

🔎 Similar Papers

Camouflaged Object Tracking: A Benchmark