SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In RGB-T salient object detection (SOD), severe inter-modal convergence imbalance between RGB and thermal infrared modalities, coupled with intense gradient conflicts between high- and low-activation regions, critically hinder performance. To address these challenges, this paper proposes a SAM-based multimodal collaborative optimization framework. First, single-modal supervision is introduced to strengthen learning for the weaker modality. Second, a gradient deconfliction mechanism is designed to mitigate cross-modal gradient interference. Third, a dual-decoupled adapter is constructed to separately regulate high-activation (foreground) and low-activation (background) neurons, thereby enhancing background modeling to better highlight salient objects. Extensive experiments demonstrate state-of-the-art performance on multiple RGB-T SOD benchmarks. Moreover, the framework exhibits strong generalization capability across diverse downstream tasks, including scribble-supervised SOD, RGB-D SOD, and rail defect detection.

Technology Category

Application Category

📝 Abstract
RGB-T salient object detection (SOD) aims to segment attractive objects by combining RGB and thermal infrared images. To enhance performance, the Segment Anything Model has been fine-tuned for this task. However, the imbalance convergence of two modalities and significant gradient difference between high- and low- activations are ignored, thereby leaving room for further performance enhancement. In this paper, we propose a model called extit{SAMSOD}, which utilizes unimodal supervision to enhance the learning of non-dominant modality and employs gradient deconfliction to reduce the impact of conflicting gradients on model convergence. The method also leverages two decoupled adapters to separately mask high- and low-activation neurons, emphasizing foreground objects by enhancing background learning. Fundamental experiments on RGB-T SOD benchmark datasets and generalizability experiments on scribble supervised RGB-T SOD, fully supervised RGB-D SOD datasets and full-supervised RGB-D rail surface defect detection all demonstrate the effectiveness of our proposed method.
Problem

Research questions and friction points this paper is trying to address.

Addresses modality imbalance in RGB-T salient object detection optimization
Reduces gradient conflicts between high and low activation neurons
Enhances foreground segmentation by improving background learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unimodal supervision enhances non-dominant modality learning
Gradient deconfliction reduces conflicting gradient impacts
Decoupled adapters separately mask high and low activations
🔎 Similar Papers
No similar papers found.
Z
Zhengyi Liu
Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
X
Xinrui Wang
Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
X
Xianyong Fang
Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
Z
Zhengzheng Tu
Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
Linbo Wang
Linbo Wang
University of Toronto