🤖 AI Summary
Existing open-domain video object segmentation methods suffer significant performance degradation in underwater environments due to color distortion, low contrast, and object camouflage, further exacerbated by the lack of high-quality datasets. To address this, this work introduces UW-VOS, the first large-scale benchmark for underwater video object segmentation, and proposes SAM-U, a lightweight adaptation framework built upon SAM2. SAM-U integrates efficient adapters into the image encoder, enabling parameter-efficient domain transfer with only approximately 2% additional trainable parameters. Experiments reveal that state-of-the-art methods experience an average drop of 13 points in J&F score on UW-VOS, whereas SAM-U achieves the current best performance. The study further identifies small objects, strong camouflage, and frequent object entry/exit from the field of view as key challenges in underwater video object segmentation.
📝 Abstract
Underwater Video Object Segmentation (VOS) is essential for marine exploration, yet open-air methods suffer significant degradation due to color distortion, low contrast, and prevalent camouflage. A primary hurdle is the lack of high-quality training data. To bridge this gap, we introduce $\textbf{UW-VOS}$, the first large-scale underwater VOS benchmark comprising 1,431 video sequences across 409 categories with 309,295 mask annotations, constructed via a semi-automatic data engine with rigorous human verification. We further propose $\textbf{SAM-U}$, a parameter-efficient framework that adapts SAM2 to the underwater domain. By inserting lightweight adapters into the image encoder, SAM-U achieves state-of-the-art performance with only $\sim$2$\%$ trainable parameters. Extensive experiments reveal that existing methods experience an average 13-point $\mathcal{J}\&\mathcal{F}$ drop on UW-VOS, while SAM-U effectively bridges this domain gap. Detailed attribute-based analysis further identifies small targets, camouflage, and exit-re-entry as critical bottlenecks, providing a roadmap for future research in robust underwater perception.