MGD-SAM2: Multi-view Guided Detail-enhanced Segment Anything Model 2 for High-Resolution Class-agnostic Segmentation

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address SAM2’s limitations in high-resolution, category-agnostic image segmentation—namely low mask accuracy, heavy reliance on manual prompts, and loss of fine-grained details—this paper proposes a multi-view collaborative guidance framework. Methodologically, it introduces: (1) MPAdapter for cross-scale feature adaptive alignment; (2) MCEM to model complementary multi-view enhancement; (3) HMIM for hierarchical semantic-detail interaction; and (4) DRM for high-resolution mask refinement. Notably, this is the first work to embed multi-view perception into the SAM2 architecture while preserving category-agnostic capability. Experiments demonstrate significant improvements in boundary sharpness and structural integrity. On multiple high- and normal-resolution benchmarks, our method consistently outperforms baseline models in mAP and Boundary F-score, exhibiting strong generalization. The code is publicly available.

Technology Category

Application Category

📝 Abstract
Segment Anything Models (SAMs), as vision foundation models, have demonstrated remarkable performance across various image analysis tasks. Despite their strong generalization capabilities, SAMs encounter challenges in fine-grained detail segmentation for high-resolution class-independent segmentation (HRCS), due to the limitations in the direct processing of high-resolution inputs and low-resolution mask predictions, and the reliance on accurate manual prompts. To address these limitations, we propose MGD-SAM2 which integrates SAM2 with multi-view feature interaction between a global image and local patches to achieve precise segmentation. MGD-SAM2 incorporates the pre-trained SAM2 with four novel modules: the Multi-view Perception Adapter (MPAdapter), the Multi-view Complementary Enhancement Module (MCEM), the Hierarchical Multi-view Interaction Module (HMIM), and the Detail Refinement Module (DRM). Specifically, we first introduce MPAdapter to adapt the SAM2 encoder for enhanced extraction of local details and global semantics in HRCS images. Then, MCEM and HMIM are proposed to further exploit local texture and global context by aggregating multi-view features within and across multi-scales. Finally, DRM is designed to generate gradually restored high-resolution mask predictions, compensating for the loss of fine-grained details resulting from directly upsampling the low-resolution prediction maps. Experimental results demonstrate the superior performance and strong generalization of our model on multiple high-resolution and normal-resolution datasets. Code will be available at https://github.com/sevenshr/MGD-SAM2.
Problem

Research questions and friction points this paper is trying to address.

Enhances fine-grained detail segmentation in high-resolution images
Reduces reliance on manual prompts for segmentation accuracy
Improves mask prediction resolution without direct upsampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view feature interaction for precise segmentation
Four novel modules enhance SAM2 capabilities
Gradually restored high-resolution mask predictions
🔎 Similar Papers
No similar papers found.
Haoran Shen
Haoran Shen
University of Science and Technology Beijing
semantic segmentationclassification
P
Peixian Zhuang
Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, the School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
J
Jiahao Kou
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Y
Yuxin Zeng
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
H
Haoying Xu
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
J
Jiangyun Li
Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, the School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Shunde Graduate School of University of Science and Technology Beijing, China