MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment

πŸ“… 2025-10-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current underwater instance segmentation methods are constrained by closed-set vocabularies, limiting their ability to recognize novel marine species. While open-vocabulary (OV) segmentation has advanced on natural images, its performance degrades significantly underwater due to severe color attenuation, structural distortion, and ill-defined semantic categories. To address this, we introduce MARISβ€”the first large-scale, fine-grained underwater open-vocabulary instance segmentation benchmark. We further propose a Geometric Prior Enhancement Module (GPEM) to explicitly model underwater structural degradation, and a Semantic Alignment Injection Mechanism (SAIM) to establish cross-domain semantic mappings. Our approach integrates geometric cues and domain-specific priors into the OV framework, effectively mitigating visual degradation and semantic misalignment. On MARIS, our method substantially outperforms existing OV baselines under both in-domain and cross-domain evaluation settings. This work establishes foundational data and algorithmic infrastructure for open-world perception in underwater environments.

Technology Category

Application Category

πŸ“ Abstract
Most existing underwater instance segmentation approaches are constrained by close-vocabulary prediction, limiting their ability to recognize novel marine categories. To support evaluation, we introduce extbf{MARIS} (underline{Mar}ine Open-Vocabulary underline{I}nstance underline{S}egmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation, featuring a limited set of seen categories and diverse unseen categories. Although OV segmentation has shown promise on natural images, our analysis reveals that transfer to underwater scenes suffers from severe visual degradation (e.g., color attenuation) and semantic misalignment caused by lack underwater class definitions. To address these issues, we propose a unified framework with two complementary components. The Geometric Prior Enhancement Module ( extbf{GPEM}) leverages stable part-level and structural cues to maintain object consistency under degraded visual conditions. The Semantic Alignment Injection Mechanism ( extbf{SAIM}) enriches language embeddings with domain-specific priors, mitigating semantic ambiguity and improving recognition of unseen categories. Experiments show that our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting on MARIS, establishing a strong foundation for future underwater perception research.
Problem

Research questions and friction points this paper is trying to address.

Addresses underwater instance segmentation limited by close-vocabulary constraints
Mitigates visual degradation and semantic misalignment in underwater scenes
Enhances geometric consistency and semantic alignment for novel categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Prior Enhancement Module stabilizes object consistency
Semantic Alignment Injection Mechanism enriches language embeddings
Unified framework addresses visual degradation and semantic misalignment
πŸ”Ž Similar Papers
No similar papers found.
B
Bingyu Li
TeleAI, USTC
Feiyu Wang
Feiyu Wang
Fudan University
computer vision
D
Da Zhang
NWPU, USTC
Z
Zhiyuan Zhao
USTC
J
Junyu Gao
USTC
X
Xuelong Li
USTC