🤖 AI Summary
In RGB-T salient object detection (SOD), severe inter-modal convergence imbalance between RGB and thermal infrared modalities, coupled with intense gradient conflicts between high- and low-activation regions, critically hinder performance. To address these challenges, this paper proposes a SAM-based multimodal collaborative optimization framework. First, single-modal supervision is introduced to strengthen learning for the weaker modality. Second, a gradient deconfliction mechanism is designed to mitigate cross-modal gradient interference. Third, a dual-decoupled adapter is constructed to separately regulate high-activation (foreground) and low-activation (background) neurons, thereby enhancing background modeling to better highlight salient objects. Extensive experiments demonstrate state-of-the-art performance on multiple RGB-T SOD benchmarks. Moreover, the framework exhibits strong generalization capability across diverse downstream tasks, including scribble-supervised SOD, RGB-D SOD, and rail defect detection.
📝 Abstract
RGB-T salient object detection (SOD) aims to segment attractive objects by combining RGB and thermal infrared images. To enhance performance, the Segment Anything Model has been fine-tuned for this task. However, the imbalance convergence of two modalities and significant gradient difference between high- and low- activations are ignored, thereby leaving room for further performance enhancement. In this paper, we propose a model called extit{SAMSOD}, which utilizes unimodal supervision to enhance the learning of non-dominant modality and employs gradient deconfliction to reduce the impact of conflicting gradients on model convergence. The method also leverages two decoupled adapters to separately mask high- and low-activation neurons, emphasizing foreground objects by enhancing background learning. Fundamental experiments on RGB-T SOD benchmark datasets and generalizability experiments on scribble supervised RGB-T SOD, fully supervised RGB-D SOD datasets and full-supervised RGB-D rail surface defect detection all demonstrate the effectiveness of our proposed method.