HFP-SAM: Hierarchical Frequency Prompted SAM for Efficient Marine Animal Segmentation

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes HFP-SAM, a novel framework addressing key challenges in marine animal segmentation—namely, the difficulty of modeling long-range dependencies, insufficient detail perception, and the absence of frequency-domain information. HFP-SAM is the first to integrate frequency-domain priors into the Segment Anything Model (SAM), introducing a Frequency-Guided Adapter (FGA) to inject marine-scene semantics and a Frequency-aware Point Selection (FPS) mechanism to generate high-quality prompts. Additionally, it incorporates a Full-View Mamba (FVM) module with linear computational complexity to efficiently fuse spatial-channel contextual information. Extensive experiments on four public marine animal segmentation datasets demonstrate that HFP-SAM significantly outperforms existing methods. The code has been made publicly available.

Technology Category

Application Category

📝 Abstract
Marine Animal Segmentation (MAS) aims at identifying and segmenting marine animals from complex marine environments. Most of previous deep learning-based MAS methods struggle with the long-distance modeling issue. Recently, Segment Anything Model (SAM) has gained popularity in general image segmentation. However, it lacks of perceiving fine-grained details and frequency information. To this end, we propose a novel learning framework, named Hierarchical Frequency Prompted SAM (HFP-SAM) for high-performance MAS. First, we design a Frequency Guided Adapter (FGA) to efficiently inject marine scene information into the frozen SAM backbone through frequency domain prior masks. Additionally, we introduce a Frequency-aware Point Selection (FPS) to generate highlighted regions through frequency analysis. These regions are combined with the coarse predictions of SAM to generate point prompts and integrate into SAM's decoder for fine predictions. Finally, to obtain comprehensive segmentation masks, we introduce a Full-View Mamba (FVM) to efficiently extract spatial and channel contextual information with linear computational complexity. Extensive experiments on four public datasets demonstrate the superior performance of our approach. The source code is publicly available at https://github.com/Drchip61/TIP-HFP-SAM.
Problem

Research questions and friction points this paper is trying to address.

Marine Animal Segmentation
Long-distance modeling
Frequency information
Fine-grained details
Segment Anything Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-aware prompting
Segment Anything Model (SAM)
Marine animal segmentation
Mamba architecture
Frequency domain adaptation
🔎 Similar Papers
No similar papers found.
P
Pingping Zhang
School of Future Technology, Dalian University of Technology
T
Tianyu Yan
School of Future Technology, Dalian University of Technology
Yuhao Wang
Yuhao Wang
Dalian University of Technology
Computer VisionMulti-modal FusionReID
Yang Liu
Yang Liu
Dalian University of Technology
computer visionimage processing
T
Tongdan Tang
Central Hospital of Dalian University of Technology
Y
Yili Ma
Central Hospital of Dalian University of Technology
L
Long Lv
Affiliated Zhongshan Hospital of Dalian University
F
Feng Tian
Affiliated Zhongshan Hospital of Dalian University
W
Weibing Sun
Affiliated Zhongshan Hospital of Dalian University
H
Huchuan Lu
School of Information and Communication Engineering, Dalian University of Technology