🤖 AI Summary
To address target localization bias and boundary ambiguity in marine salient object segmentation under complex underwater scenes, this paper proposes DiffMSS—a diffusion-based framework introducing three key innovations: (1) a novel region-text semantic matching mechanism for precise, fine-grained localization of biological structures; (2) a consensus-deterministic sampling strategy to suppress overconfident erroneous segmentation; and (3) an integrated design combining conditional diffusion modeling, region-level semantic knowledge distillation, and text-guided conditional feature learning. Evaluated across multiple benchmarks, DiffMSS consistently outperforms state-of-the-art methods, achieving substantial improvements in F-measure, E-measure, and S-measure. Qualitative results further demonstrate its superior localization accuracy and sharper object boundaries.
📝 Abstract
Marine Saliency Segmentation (MSS) plays a pivotal role in various vision-based marine exploration tasks. However, existing marine segmentation techniques face the dilemma of object mislocalization and imprecise boundaries due to the complex underwater environment. Meanwhile, despite the impressive performance of diffusion models in visual segmentation, there remains potential to further leverage contextual semantics to enhance feature learning of region-level salient objects, thereby improving segmentation outcomes. Building on this insight, we propose DiffMSS, a novel marine saliency segmenter based on the diffusion model, which utilizes semantic knowledge distillation to guide the segmentation of marine salient objects. Specifically, we design a region-word similarity matching mechanism to identify salient terms at the word level from the text descriptions. These high-level semantic features guide the conditional feature learning network in generating salient and accurate diffusion conditions with semantic knowledge distillation. To further refine the segmentation of fine-grained structures in unique marine organisms, we develop the dedicated consensus deterministic sampling to suppress overconfident missegmentations. Comprehensive experiments demonstrate the superior performance of DiffMSS over state-of-the-art methods in both quantitative and qualitative evaluations.