RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network

📅 2023-12-06

🏛️ ACM Multimedia Asia

📈 Citations: 1

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Existing RGB-D tracking methods typically employ single-level bimodal feature fusion, resulting in limited robustness and low inference speed. To address these limitations, this paper proposes a Hierarchical Modality Aggregation and Distribution network (HMAD), the first framework to enable cross-level collaborative modeling of RGB and depth features. HMAD leverages a dual-stream neural network to extract multi-level features, incorporates a cross-level attention mechanism for modality-adaptive weighted fusion, and introduces an adaptive depth feature calibration and distribution module to jointly account for modality heterogeneity and hierarchical complementarity. Evaluated on multiple standard RGB-D benchmark datasets, HMAD achieves state-of-the-art (SOTA) performance with real-time inference speed exceeding 32 FPS. It significantly improves tracking robustness, generalization capability, and interference resilience—particularly under challenging scenarios involving occlusion, illumination variation, and sensor noise.

Technology Category

Application Category

📝 Abstract

The integration of dual-modal features has been pivotal in advancing RGB-Depth (RGB-D) tracking. However, current trackers are less efficient and focus solely on single-level features, resulting in weaker robustness in fusion and slower speeds that fail to meet the demands of real-world applications. In this paper, we introduce a novel network, denoted as HMAD (Hierarchical Modality Aggregation and Distribution), which addresses these challenges. HMAD leverages the distinct feature representation strengths of RGB and depth modalities, giving prominence to a hierarchical approach for feature distribution and fusion, thereby enhancing the robustness of RGB-D tracking. Experimental results on various RGB-D datasets demonstrate that HMAD achieves state-of-the-art performance. Moreover, real-world experiments further validate HMAD’s capacity to effectively handle a spectrum of tracking challenges in real-time scenarios.

Problem

Research questions and friction points this paper is trying to address.

Improves RGB-D tracking robustness via hierarchical feature fusion

Addresses inefficiency in current RGB-D trackers' single-level features

Enables real-time performance for diverse tracking challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical modality aggregation and distribution network

Leverages RGB and depth feature strengths

Enhances robustness in real-time tracking

🔎 Similar Papers

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba