Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Solution

📅 2024-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RGB-T tracking benchmarks lack sufficient representation of multi-modal failure (MMF) scenarios—such as extreme illumination or thermal saturation—that cause unimodal failure in either RGB or thermal infrared (TIR) modalities. To address this gap, we introduce MV-RGBT, the first RGB-T tracking benchmark explicitly designed for MMF evaluation, covering 19 challenging scenarios and 36 object categories. It formally defines unimodal failure patterns for RGB and TIR streams and introduces the novel problem of “when to fuse” modalities. Methodologically, we propose a modality-effectiveness-driven data acquisition and synchronized annotation framework, and design MoETrack—a modular, confidence-guided mixture-of-experts architecture enabling dynamic weighted fusion. Experiments demonstrate that MoETrack achieves state-of-the-art performance on MV-RGBT, GTOT, and LasHeR, empirically validating the critical role of non-mandatory, adaptive fusion for robust RGB-T tracking under MMF conditions.

Technology Category

Application Category

📝 Abstract
RGBT tracking draws increasing attention because its robustness in multi-modal warranting (MMW) scenarios, such as nighttime and adverse weather conditions, where relying on a single sensing modality fails to ensure stable tracking results. However, existing benchmarks predominantly contain videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quality. This weakens the representativeness of existing benchmarks in severe imaging conditions, leading to tracking failures in MMW scenarios. To bridge this gap, we present a new benchmark considering the modality validity, MV-RGBT, captured specifically from MMW scenarios where either RGB (extreme illumination) or TIR (thermal truncation) modality is invalid. Hence, it is further divided into two subsets according to the valid modality, offering a new compositional perspective for evaluation and providing valuable insights for future designs. Moreover, MV-RGBT is the most diverse benchmark of its kind, featuring 36 different object categories captured across 19 distinct scenes. Furthermore, considering severe imaging conditions in MMW scenarios, a new problem is posed in RGBT tracking, named `when to fuse', to stimulate the development of fusion strategies for such scenarios. To facilitate its discussion, we propose a new solution with a mixture of experts, named MoETrack, where each expert generates independent tracking results along with a confidence score. Extensive results demonstrate the significant potential of MV-RGBT in advancing RGBT tracking and elicit the conclusion that fusion is not always beneficial, especially in MMW scenarios. Besides, MoETrack achieves state-of-the-art results on several benchmarks, including MV-RGBT, GTOT, and LasHeR. Github: https://github.com/Zhangyong-Tang/MVRGBT.
Problem

Research questions and friction points this paper is trying to address.

Addresses lack of representativeness in RGBT tracking benchmarks for severe imaging conditions.
Introduces MV-RGBT benchmark focusing on modality validity in multi-modal warranting scenarios.
Proposes 'when to fuse' problem and MoETrack solution for improved RGBT tracking performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

New benchmark MV-RGBT for RGBT tracking
Mixture of experts solution named MoETrack
Focus on modality validity in severe conditions
🔎 Similar Papers
No similar papers found.
Zhangyong Tang
Zhangyong Tang
Jiangnan University
T
Tianyang Xu
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, P.R. China
Z
Zhenhua Feng
Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, GU2 7XH, UK
X
Xuefeng Zhu
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, P.R. China
C
Chunyang Cheng
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, P.R. China
Xiao-Jun Wu
Xiao-Jun Wu
School of Artificial Intelligence and Computer Science, Jiangnan University
artificial intelligencepattern recognitionmachine learning
Josef Kittler
Josef Kittler
University of Surrey
engineering