Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios

📅 2026-01-02
🏛️ IEEE Transactions on Image Processing
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of the Segment Anything Model (SAM) in visually non-salient scenarios, where foreground and background exhibit low contrast. To mitigate this limitation, the authors propose VNS-SAM, which incorporates a Mask-Edge Token Interactive decoder and a Non-Salient Feature Mining module. These components effectively enhance SAM’s perception of non-salient objects with minimal additional parameters and computational overhead, while preserving its zero-shot generalization capability. The study also introduces VNS-SEG, the first unified benchmark dataset dedicated to multi-class non-salient segmentation. Experimental results demonstrate that VNS-SAM achieves superior performance across diverse non-salient segmentation tasks, notably outperforming baseline methods under zero-shot settings. The added parameters can be optimized within four hours, and both the model and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Segment Anything Model (SAM), known for its remarkable zero-shot segmentation capabilities, has garnered significant attention in the community. Nevertheless, its performance is challenged when dealing with what we refer to as visually non-salient scenarios, where there is low contrast between the foreground and background. In these cases, existing methods often cannot capture accurate contours and fail to produce promising segmentation results. In this paper, we propose Visually Non-Salient SAM (VNS-SAM), aiming to enhance SAM’s perception of visually non-salient scenarios while preserving its original zero-shot generalizability. We achieve this by effectively exploiting SAM’s low-level features through two designs: Mask-Edge Token Interactive decoder and Non-Salient Feature Mining module. These designs help the SAM decoder gain a deeper understanding of non-salient characteristics with only marginal parameter increments and computational requirements. The additional parameters of VNS-SAM can be optimized within 4 hours, demonstrating its feasibility and practicality. In terms of data, we established VNS-SEG, a unified dataset for various VNS scenarios, with more than 35K images, in contrast to previous single-task adaptations. It is designed to make the model learn more robust VNS features and comprehensively benchmark the model’s segmentation performance and generalizability on VNS scenarios. Extensive experiments across various VNS segmentation tasks demonstrate the superior performance of VNS-SAM, particularly under zero-shot settings, highlighting its potential for broad real-world applications. Codes and datasets are publicly available at https://guangqian-guo.github.io/VNS-SAM/
Problem

Research questions and friction points this paper is trying to address.

visually non-salient
segmentation
Segment Anything Model
low contrast
zero-shot generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visually Non-Salient Segmentation
Segment Anything Model
Zero-shot Generalization
Feature Mining
Edge-Aware Decoder
🔎 Similar Papers
No similar papers found.
G
Guangqian Guo
Unmanned System Research Institute at Northwestern Polytechnical University, Xi’an 710072, China
P
Pengfei Chen
School of Electronic, Electrical, and Communication Engineering, University of Chinese Academic of Sciences, Beijing 101408, China
Yong Guo
Yong Guo
Max Planck Institute for Informatics
AIGCModel CompressionImage RestorationComputer Vision
H
Huafeng Chen
Unmanned System Research Institute at Northwestern Polytechnical University, Xi’an 710072, China
Boqiang Zhang
Boqiang Zhang
Tencent AILab
S
Shan Gao
Unmanned System Research Institute at Northwestern Polytechnical University, Xi’an 710072, China