Towards Underwater Camouflaged Object Tracking: Benchmark and Baselines

📅 2024-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance limitations of underwater camouflaged object detection and tracking—stemming from the lack of large-scale, task-specific datasets—this paper introduces UW-COT220, the first large-scale multimodal benchmark for underwater camouflaged object tracking, and conducts a systematic evaluation of existing methods. Building upon this benchmark, we propose VL-SAM2, a vision-language collaborative tracking framework. VL-SAM2 is the first to adapt the video foundation model SAM2 to underwater tracking, demonstrating its superior robustness over SAM in complex underwater environments. It innovatively integrates text-guided mask refinement, cross-frame feature alignment, and vision-language prior modeling. Experiments show that VL-SAM2 achieves state-of-the-art performance on UW-COT220. Both the dataset and code are publicly released, establishing critical infrastructure for intelligent underwater perception research.

Technology Category

Application Category

📝 Abstract
Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale datasets. However, existing tracking datasets are primarily focused on open-air scenarios, which greatly limits the development of object tracking in underwater environments. To bridge this gap, we take a step forward by proposing the first large-scale multimodal underwater camouflaged object tracking dataset, namely UW-COT220. Based on the proposed dataset, this paper first comprehensively evaluates current advanced visual object tracking methods and SAM- and SAM2-based trackers in challenging underwater environments. Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects. Furthermore, we propose a novel vision-language tracking framework called VL-SAM2, based on the video foundation model SAM2. Experimental results demonstrate that our VL-SAM2 achieves state-of-the-art performance on the UW-COT220 dataset. The dataset and codes can be accessible at color{magenta}{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}.
Problem

Research questions and friction points this paper is trying to address.

Underwater Object Detection
Lack of Large Datasets
Performance Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

UW-COT220 Dataset
SAM2 Optimization
VL-SAM2 Development
🔎 Similar Papers
No similar papers found.
C
Chunhui Zhang
Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai 200240, China; CloudWalk Technology Co., Ltd, 201203, China
L
Li Liu
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511458, China
Guanjie Huang
Guanjie Huang
The Hong Kong University of Science and Technology (Guangzhou)
Computer Science
H
Hao Wen
CloudWalk Technology Co., Ltd, 201203, China
X
Xi Zhou
CloudWalk Technology Co., Ltd, 201203, China
Yanfeng Wang
Yanfeng Wang
Shanghai Jiao Tong University