π€ AI Summary
Current underwater instance segmentation methods are constrained by closed-set vocabularies, limiting their ability to recognize novel marine species. While open-vocabulary (OV) segmentation has advanced on natural images, its performance degrades significantly underwater due to severe color attenuation, structural distortion, and ill-defined semantic categories. To address this, we introduce MARISβthe first large-scale, fine-grained underwater open-vocabulary instance segmentation benchmark. We further propose a Geometric Prior Enhancement Module (GPEM) to explicitly model underwater structural degradation, and a Semantic Alignment Injection Mechanism (SAIM) to establish cross-domain semantic mappings. Our approach integrates geometric cues and domain-specific priors into the OV framework, effectively mitigating visual degradation and semantic misalignment. On MARIS, our method substantially outperforms existing OV baselines under both in-domain and cross-domain evaluation settings. This work establishes foundational data and algorithmic infrastructure for open-world perception in underwater environments.
π Abstract
Most existing underwater instance segmentation approaches are constrained by close-vocabulary prediction, limiting their ability to recognize novel marine categories. To support evaluation, we introduce extbf{MARIS} (underline{Mar}ine Open-Vocabulary underline{I}nstance underline{S}egmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation, featuring a limited set of seen categories and diverse unseen categories. Although OV segmentation has shown promise on natural images, our analysis reveals that transfer to underwater scenes suffers from severe visual degradation (e.g., color attenuation) and semantic misalignment caused by lack underwater class definitions. To address these issues, we propose a unified framework with two complementary components. The Geometric Prior Enhancement Module ( extbf{GPEM}) leverages stable part-level and structural cues to maintain object consistency under degraded visual conditions. The Semantic Alignment Injection Mechanism ( extbf{SAIM}) enriches language embeddings with domain-specific priors, mitigating semantic ambiguity and improving recognition of unseen categories. Experiments show that our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting on MARIS, establishing a strong foundation for future underwater perception research.