HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RGB and event cameras offer complementary modalities but exhibit significant spatiotemporal asymmetry—high spatial resolution in RGB versus high temporal resolution and high dynamic range (HDR) in event streams—hindering multimodal object tracking performance. To address this, we propose a hierarchical asymmetric distillation framework that explicitly mitigates modality discrepancies via layered feature alignment and spatiotemporal consistency modeling. Our approach enables efficient cross-modal knowledge transfer into a lightweight student network. By integrating multimodal knowledge distillation with joint optimization, it achieves substantial improvements over state-of-the-art methods across multiple benchmarks. Ablation studies confirm the effectiveness and necessity of both hierarchical alignment and asymmetric distillation design. To our knowledge, this is the first work to systematically model and alleviate the spatiotemporal asymmetry between RGB and event modalities, yielding a compact yet accurate multimodal tracker that balances precision and efficiency.

Technology Category

Application Category

📝 Abstract
RGB cameras excel at capturing rich texture details with high spatial resolution, whereas event cameras offer exceptional temporal resolution and a high dynamic range (HDR). Leveraging their complementary strengths can substantially enhance object tracking under challenging conditions, such as high-speed motion, HDR environments, and dynamic background interference. However, a significant spatio-temporal asymmetry exists between these two modalities due to their fundamentally different imaging mechanisms, hindering effective multi-modal integration. To address this issue, we propose {Hierarchical Asymmetric Distillation} (HAD), a multi-modal knowledge distillation framework that explicitly models and mitigates spatio-temporal asymmetries. Specifically, HAD proposes a hierarchical alignment strategy that minimizes information loss while maintaining the student network's computational efficiency and parameter compactness. Extensive experiments demonstrate that HAD consistently outperforms state-of-the-art methods, and comprehensive ablation studies further validate the effectiveness and necessity of each designed component. The code will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Bridging spatio-temporal gaps between RGB and event cameras
Addressing asymmetric imaging mechanisms for multi-modal integration
Enhancing object tracking in challenging high-speed HDR conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical asymmetric distillation for spatio-temporal gaps
Hierarchical alignment strategy minimizes information loss
Maintains student network computational efficiency and compactness
🔎 Similar Papers
No similar papers found.
Y
Yao Deng
Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572025, China, and also with the Hubei Key Laboratory of Transportation Internet of Things, School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
X
Xian Zhong
Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572025, China, and also with the Hubei Key Laboratory of Transportation Internet of Things, School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
W
Wenxuan Liu
State Key Laboratory for Multimedia Information Processing, Peking University, Beijing 100091, China
Zhaofei Yu
Zhaofei Yu
Peking University
Brain-inspired ComputingSpiking Neural NetworksComputational Neuroscience
J
Jingling Yuan
Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572025, China, and also with the Hubei Key Laboratory of Transportation Internet of Things, School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
Tiejun Huang
Tiejun Huang
Professor,School of Computer Science, Peking University
Visual Information Processing