🤖 AI Summary
This work proposes HFP-SAM, a novel framework addressing key challenges in marine animal segmentation—namely, the difficulty of modeling long-range dependencies, insufficient detail perception, and the absence of frequency-domain information. HFP-SAM is the first to integrate frequency-domain priors into the Segment Anything Model (SAM), introducing a Frequency-Guided Adapter (FGA) to inject marine-scene semantics and a Frequency-aware Point Selection (FPS) mechanism to generate high-quality prompts. Additionally, it incorporates a Full-View Mamba (FVM) module with linear computational complexity to efficiently fuse spatial-channel contextual information. Extensive experiments on four public marine animal segmentation datasets demonstrate that HFP-SAM significantly outperforms existing methods. The code has been made publicly available.
📝 Abstract
Marine Animal Segmentation (MAS) aims at identifying and segmenting marine animals from complex marine environments. Most of previous deep learning-based MAS methods struggle with the long-distance modeling issue. Recently, Segment Anything Model (SAM) has gained popularity in general image segmentation. However, it lacks of perceiving fine-grained details and frequency information. To this end, we propose a novel learning framework, named Hierarchical Frequency Prompted SAM (HFP-SAM) for high-performance MAS. First, we design a Frequency Guided Adapter (FGA) to efficiently inject marine scene information into the frozen SAM backbone through frequency domain prior masks. Additionally, we introduce a Frequency-aware Point Selection (FPS) to generate highlighted regions through frequency analysis. These regions are combined with the coarse predictions of SAM to generate point prompts and integrate into SAM's decoder for fine predictions. Finally, to obtain comprehensive segmentation masks, we introduce a Full-View Mamba (FVM) to efficiently extract spatial and channel contextual information with linear computational complexity. Extensive experiments on four public datasets demonstrate the superior performance of our approach. The source code is publicly available at https://github.com/Drchip61/TIP-HFP-SAM.