🤖 AI Summary
Existing 3D single-object tracking methods rely on fixed distribution assumptions (e.g., Gaussian or Laplacian) and employ complex multi-loss designs, limiting adaptability to diverse target scales and motion patterns—thus compromising both accuracy and real-time performance. To address this, we propose a lightweight BEV-based motion modeling framework that eliminates multi-module architectures and multi-loss optimization, instead performing direct, single-stage regression of target displacement. We introduce a novel target-adaptive likelihood learning mechanism that dynamically models motion uncertainty, overcoming the limitations of static distribution assumptions. The framework integrates BEV feature encoding with end-to-end differentiable training. Evaluated on KITTI, nuScenes, and Waymo, it achieves state-of-the-art tracking accuracy while maintaining an inference speed of 200 FPS—demonstrating superior balance between precision and deployability in real-time systems.
📝 Abstract
3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving. However, existing algorithms often involve complex designs and multiple loss functions, making model training and deployment challenging. Furthermore, their reliance on fixed probability distribution assumptions (e.g., Laplacian or Gaussian) hinders their ability to adapt to diverse target characteristics such as varying sizes and motion patterns, ultimately affecting tracking precision and robustness. To address these issues, we propose BEVTrack, a simple yet effective motion-based tracking method. BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss. To enhance accuracy for targets with diverse attributes, it learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods. This approach provides valuable priors for tracking and significantly boosts performance. Comprehensive experiments on KITTI, NuScenes, and Waymo Open Dataset demonstrate that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability. The code will be released at https://github.com/xmm-prio/BEVTrack.