BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

📅 2023-09-05
🏛️ International Joint Conference on Artificial Intelligence
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D single-object tracking methods rely on fixed distribution assumptions (e.g., Gaussian or Laplacian) and employ complex multi-loss designs, limiting adaptability to diverse target scales and motion patterns—thus compromising both accuracy and real-time performance. To address this, we propose a lightweight BEV-based motion modeling framework that eliminates multi-module architectures and multi-loss optimization, instead performing direct, single-stage regression of target displacement. We introduce a novel target-adaptive likelihood learning mechanism that dynamically models motion uncertainty, overcoming the limitations of static distribution assumptions. The framework integrates BEV feature encoding with end-to-end differentiable training. Evaluated on KITTI, nuScenes, and Waymo, it achieves state-of-the-art tracking accuracy while maintaining an inference speed of 200 FPS—demonstrating superior balance between precision and deployability in real-time systems.
📝 Abstract
3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving. However, existing algorithms often involve complex designs and multiple loss functions, making model training and deployment challenging. Furthermore, their reliance on fixed probability distribution assumptions (e.g., Laplacian or Gaussian) hinders their ability to adapt to diverse target characteristics such as varying sizes and motion patterns, ultimately affecting tracking precision and robustness. To address these issues, we propose BEVTrack, a simple yet effective motion-based tracking method. BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss. To enhance accuracy for targets with diverse attributes, it learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods. This approach provides valuable priors for tracking and significantly boosts performance. Comprehensive experiments on KITTI, NuScenes, and Waymo Open Dataset demonstrate that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability. The code will be released at https://github.com/xmm-prio/BEVTrack.
Problem

Research questions and friction points this paper is trying to address.

Addresses complex designs in 3D single object tracking
Overcomes fixed distribution assumptions limiting tracking adaptability
Enhances accuracy for diverse target sizes and motions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly estimates object motion in BEV
Learns adaptive likelihood functions per target
Uses single regression loss for simplicity
Y
Yuxiang Yang
School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, China
Y
Yingqi Deng
School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, China
M
Mian Pan
Z
Zheng-Jun Zha
Department of Automation, University of Science and Technology of China 230022, China
J
Jing Zhang
School of Computer Science, The University of Sydney, NSW 2006, Australia