Global-Local Feature Decoding with Adapter-Guided SAMv2 for Salient Object Detection

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the underutilized potential of large-scale vision foundation models in salient object detection and the high cost and overfitting risks associated with full fine-tuning. To this end, the authors propose GLASSNet, a framework that freezes the SAMv2 encoder and introduces a lightweight spatial-aware convolutional adapter comprising less than 3% learnable parameters. GLASSNet employs dual decoders to separately model global semantics and local details, fusing their features to produce high-precision saliency maps. Extensive experiments demonstrate that the proposed method surpasses state-of-the-art approaches across multiple benchmarks for both salient and camouflaged object detection, validating its efficiency and effectiveness.

📝 Abstract

Salient Object Detection (SOD) remains an essential yet underexplored task in the era of large-scale vision models. Although foundation models like SAM exhibit strong generalization, their potential for SOD is not fully realized, and training or fully fine-tuning them is computationally expensive and prone to overfitting under limited data. To overcome these challenges, we introduce GLASSNet, a Global-Local feature decoding framework that uses SAMv2 as a frozen encoder paired with a lightweight, spatially aware convolutional adapter-reducing learnable encoder parameters by over 97%. To enhance saliency quality, GLASSNet employs a dual-decoder architecture: one decoder captures global, long-range semantics with an expanded receptive field, while the other captures fine local details such as edges and textures. Fusing these complementary cues yields saliency maps that combine global coherence with local precision, producing accurate final masks. Extensive experiments on standard SOD and camouflaged object detection benchmarks show that GLASSNet surpasses state-of-the-art methods, demonstrating the power of frozen foundation models combined with targeted adaptation and global-local decoding.

Problem

Research questions and friction points this paper is trying to address.

Salient Object Detection

Foundation Models

Parameter Efficiency

Overfitting

Computational Cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter-Guided SAMv2

Global-Local Feature Decoding

Salient Object Detection