MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Current underwater instance segmentation methods are constrained by closed-set vocabularies, limiting their ability to recognize novel marine species. While open-vocabulary (OV) segmentation has advanced on natural images, its performance degrades significantly underwater due to severe color attenuation, structural distortion, and ill-defined semantic categories. To address this, we introduce MARIS—the first large-scale, fine-grained underwater open-vocabulary instance segmentation benchmark. We further propose a Geometric Prior Enhancement Module (GPEM) to explicitly model underwater structural degradation, and a Semantic Alignment Injection Mechanism (SAIM) to establish cross-domain semantic mappings. Our approach integrates geometric cues and domain-specific priors into the OV framework, effectively mitigating visual degradation and semantic misalignment. On MARIS, our method substantially outperforms existing OV baselines under both in-domain and cross-domain evaluation settings. This work establishes foundational data and algorithmic infrastructure for open-world perception in underwater environments.

Technology Category

Application Category

📝 Abstract

Most existing underwater instance segmentation approaches are constrained by close-vocabulary prediction, limiting their ability to recognize novel marine categories. To support evaluation, we introduce extbf{MARIS} (underline{Mar}ine Open-Vocabulary underline{I}nstance underline{S}egmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation, featuring a limited set of seen categories and diverse unseen categories. Although OV segmentation has shown promise on natural images, our analysis reveals that transfer to underwater scenes suffers from severe visual degradation (e.g., color attenuation) and semantic misalignment caused by lack underwater class definitions. To address these issues, we propose a unified framework with two complementary components. The Geometric Prior Enhancement Module ( extbf{GPEM}) leverages stable part-level and structural cues to maintain object consistency under degraded visual conditions. The Semantic Alignment Injection Mechanism ( extbf{SAIM}) enriches language embeddings with domain-specific priors, mitigating semantic ambiguity and improving recognition of unseen categories. Experiments show that our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting on MARIS, establishing a strong foundation for future underwater perception research.

Problem

Research questions and friction points this paper is trying to address.

Addresses underwater instance segmentation limited by close-vocabulary constraints

Mitigates visual degradation and semantic misalignment in underwater scenes

Enhances geometric consistency and semantic alignment for novel categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Prior Enhancement Module stabilizes object consistency

Semantic Alignment Injection Mechanism enriches language embeddings

Unified framework addresses visual degradation and semantic misalignment

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

Research Scientist Intern, AI Research - Perception (PhD)