Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In unsupervised RGB-T tracking, erroneous pseudo-labels—caused by target occlusion, background noise, and interference from visually similar objects—lead to inefficient modality fusion and degraded performance. To address this, we propose a modality-guided dynamic graph fusion and temporal graph-aware diffusion framework. Methodologically, we innovatively model inter-frame modality features as noise and employ a graph-guided diffusion model for generative denoising. We further introduce dynamic graph attention and an adaptive adjacency matrix generator (AMG) to capture cross-modal temporal dependencies, augmented by self-supervised contrastive learning to enhance representation robustness. Evaluated on four public RGB-T tracking benchmarks, our approach consistently outperforms state-of-the-art methods, achieving significant gains in tracking accuracy and markedly improved resilience to distractors and occlusions.

Technology Category

Application Category

📝 Abstract
To reduce the reliance on large-scale annotations, self-supervised RGB-T tracking approaches have garnered significant attention. However, the omission of the object region by erroneous pseudo-label or the introduction of background noise affects the efficiency of modality fusion, while pseudo-label noise triggered by similar object noise can further affect the tracking performance. In this paper, we propose GDSTrack, a novel approach that introduces dynamic graph fusion and temporal diffusion to address the above challenges in self-supervised RGB-T tracking. GDSTrack dynamically fuses the modalities of neighboring frames, treats them as distractor noise, and leverages the denoising capability of a generative model. Specifically, by constructing an adjacency matrix via an Adjacency Matrix Generator (AMG), the proposed Modality-guided Dynamic Graph Fusion (MDGF) module uses a dynamic adjacency matrix to guide graph attention, focusing on and fusing the object's coherent regions. Temporal Graph-Informed Diffusion (TGID) models MDGF features from neighboring frames as interference, and thus improving robustness against similar-object noise. Extensive experiments conducted on four public RGB-T tracking datasets demonstrate that GDSTrack outperforms the existing state-of-the-art methods. The source code is available at https://github.com/LiShenglana/GDSTrack.
Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on large-scale annotations in RGB-T tracking
Addressing modality fusion inefficiency from pseudo-label errors
Improving robustness against similar-object noise in tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic graph fusion for modality integration
Temporal diffusion to reduce noise interference
Adjacency matrix guides graph attention focus
S
Shenglan Li
School of Computer Sciences and Technology, China University of Mining and Technology
R
Rui Yao
School of Computer Sciences and Technology, China University of Mining and Technology
Y
Yong Zhou
School of Computer Sciences and Technology, China University of Mining and Technology
H
Hancheng Zhu
School of Computer Sciences and Technology, China University of Mining and Technology
K
Kunyang Sun
School of Computer Sciences and Technology, China University of Mining and Technology
B
Bing Liu
School of Computer Sciences and Technology, China University of Mining and Technology
Z
Zhiwen Shao
School of Computer Sciences and Technology, China University of Mining and Technology
Jiaqi Zhao
Jiaqi Zhao
Xidian University
privacy-preserving machine learning