KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of multimodal representation learning and poor generalization in RGB-thermal salient object detection (RGB-T SOD), this paper proposes KAN-SAM: a novel framework that first integrates the Kolmogorov–Arnold Network (KAN) as a lightweight adapter into Segment Anything Model 2 (SAM2), leveraging thermal image features as prompts for segmentation. We introduce an exclusive random masking strategy to reduce reliance on RGB inputs and enhance cross-scenario generalization. Additionally, we design a multimodal prompt fusion mechanism to synergistically combine complementary cues from both modalities. Evaluated on mainstream RGB-T SOD benchmarks, KAN-SAM consistently outperforms state-of-the-art methods, achieving significant improvements in detection accuracy and robustness—particularly in complex, cluttered, or low-contrast scenes.

Technology Category

Application Category

📝 Abstract
Existing RGB-thermal salient object detection (RGB-T SOD) methods aim to identify visually significant objects by leveraging both RGB and thermal modalities to enable robust performance in complex scenarios, but they often suffer from limited generalization due to the constrained diversity of available datasets and the inefficiencies in constructing multi-modal representations. In this paper, we propose a novel prompt learning-based RGB-T SOD method, named KAN-SAM, which reveals the potential of visual foundational models for RGB-T SOD tasks. Specifically, we extend Segment Anything Model 2 (SAM2) for RGB-T SOD by introducing thermal features as guiding prompts through efficient and accurate Kolmogorov-Arnold Network (KAN) adapters, which effectively enhance RGB representations and improve robustness. Furthermore, we introduce a mutually exclusive random masking strategy to reduce reliance on RGB data and improve generalization. Experimental results on benchmarks demonstrate superior performance over the state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Enhance RGB-T salient object detection generalization
Improve multi-modal representation efficiency in RGB-T SOD
Integrate thermal features with SAM2 using KAN adapters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends SAM2 with thermal prompts via KAN adapters
Uses mutually exclusive random masking for generalization
Enhances RGB representations with thermal features
🔎 Similar Papers
No similar papers found.
X
Xingyuan Li
State Key Laboratory for Novel Software Technology, Nanjing University
Ruichao Hou
Ruichao Hou
Nanjing University
Information FusionMultimedia Computing
Tongwei Ren
Tongwei Ren
Nanjing University
multimedia computing
G
Gangshan Wu
State Key Laboratory for Novel Software Technology, Nanjing University