Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In unsupervised RGB-T tracking, erroneous pseudo-labels—caused by target occlusion, background noise, and interference from visually similar objects—lead to inefficient modality fusion and degraded performance. To address this, we propose a modality-guided dynamic graph fusion and temporal graph-aware diffusion framework. Methodologically, we innovatively model inter-frame modality features as noise and employ a graph-guided diffusion model for generative denoising. We further introduce dynamic graph attention and an adaptive adjacency matrix generator (AMG) to capture cross-modal temporal dependencies, augmented by self-supervised contrastive learning to enhance representation robustness. Evaluated on four public RGB-T tracking benchmarks, our approach consistently outperforms state-of-the-art methods, achieving significant gains in tracking accuracy and markedly improved resilience to distractors and occlusions.

Technology Category

Application Category

📝 Abstract

To reduce the reliance on large-scale annotations, self-supervised RGB-T tracking approaches have garnered significant attention. However, the omission of the object region by erroneous pseudo-label or the introduction of background noise affects the efficiency of modality fusion, while pseudo-label noise triggered by similar object noise can further affect the tracking performance. In this paper, we propose GDSTrack, a novel approach that introduces dynamic graph fusion and temporal diffusion to address the above challenges in self-supervised RGB-T tracking. GDSTrack dynamically fuses the modalities of neighboring frames, treats them as distractor noise, and leverages the denoising capability of a generative model. Specifically, by constructing an adjacency matrix via an Adjacency Matrix Generator (AMG), the proposed Modality-guided Dynamic Graph Fusion (MDGF) module uses a dynamic adjacency matrix to guide graph attention, focusing on and fusing the object's coherent regions. Temporal Graph-Informed Diffusion (TGID) models MDGF features from neighboring frames as interference, and thus improving robustness against similar-object noise. Extensive experiments conducted on four public RGB-T tracking datasets demonstrate that GDSTrack outperforms the existing state-of-the-art methods. The source code is available at https://github.com/LiShenglana/GDSTrack.

Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on large-scale annotations in RGB-T tracking

Addressing modality fusion inefficiency from pseudo-label errors

Improving robustness against similar-object noise in tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic graph fusion for modality integration

Temporal diffusion to reduce noise interference

Adjacency matrix guides graph attention focus

🔎 Similar Papers

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

2024-08-16arXiv.orgCitations: 0

Authors to Follow