RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network

📅 2023-12-06
🏛️ ACM Multimedia Asia
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing RGB-D tracking methods typically employ single-level bimodal feature fusion, resulting in limited robustness and low inference speed. To address these limitations, this paper proposes a Hierarchical Modality Aggregation and Distribution network (HMAD), the first framework to enable cross-level collaborative modeling of RGB and depth features. HMAD leverages a dual-stream neural network to extract multi-level features, incorporates a cross-level attention mechanism for modality-adaptive weighted fusion, and introduces an adaptive depth feature calibration and distribution module to jointly account for modality heterogeneity and hierarchical complementarity. Evaluated on multiple standard RGB-D benchmark datasets, HMAD achieves state-of-the-art (SOTA) performance with real-time inference speed exceeding 32 FPS. It significantly improves tracking robustness, generalization capability, and interference resilience—particularly under challenging scenarios involving occlusion, illumination variation, and sensor noise.

Technology Category

Application Category

📝 Abstract
The integration of dual-modal features has been pivotal in advancing RGB-Depth (RGB-D) tracking. However, current trackers are less efficient and focus solely on single-level features, resulting in weaker robustness in fusion and slower speeds that fail to meet the demands of real-world applications. In this paper, we introduce a novel network, denoted as HMAD (Hierarchical Modality Aggregation and Distribution), which addresses these challenges. HMAD leverages the distinct feature representation strengths of RGB and depth modalities, giving prominence to a hierarchical approach for feature distribution and fusion, thereby enhancing the robustness of RGB-D tracking. Experimental results on various RGB-D datasets demonstrate that HMAD achieves state-of-the-art performance. Moreover, real-world experiments further validate HMAD’s capacity to effectively handle a spectrum of tracking challenges in real-time scenarios.
Problem

Research questions and friction points this paper is trying to address.

Improves RGB-D tracking robustness via hierarchical feature fusion
Addresses inefficiency in current RGB-D trackers' single-level features
Enables real-time performance for diverse tracking challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical modality aggregation and distribution network
Leverages RGB and depth feature strengths
Enhances robustness in real-time tracking
🔎 Similar Papers
No similar papers found.
Boyue Xu
Boyue Xu
Nanjing University
Y
Yi Xu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Ruichao Hou
Ruichao Hou
Nanjing University
Information FusionMultimedia Computing
J
Jia Bei
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Tongwei Ren
Tongwei Ren
Nanjing University
multimedia computing
G
Gangshan Wu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China