🤖 AI Summary
This work addresses the limitations of existing two-stage 3D point cloud object tracking methods, which rely on explicit foreground segmentation and consequently suffer from error accumulation and computational bottlenecks. To overcome these issues, we propose the first end-to-end single-stage tracking framework that jointly models motion and semantics without explicit segmentation, enabling both efficiency and accuracy. The core innovation lies in a focus-suppression attention mechanism, integrated with a temporal difference Siamese encoder to model inter-frame motion dynamics, thereby adaptively enhancing foreground features while suppressing background noise. Extensive experiments demonstrate that our method achieves state-of-the-art performance on major benchmarks—including KITTI, nuScenes, and Waymo—while running at an impressive inference speed of 105 FPS.
📝 Abstract
In 3D point cloud object tracking, the motion-centric methods have emerged as a promising avenue due to its superior performance in modeling inter-frame motion. However, existing two-stage motion-based approaches suffer from fundamental limitations: (1) error accumulation due to decoupled optimization caused by explicit foreground segmentation prior to motion estimation, and (2) computational bottlenecks from sequential processing. To address these challenges, we propose FocusTrack, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention. The IMM module employs a temp-oral-difference siamese encoder to capture global motion patterns between adjacent frames. The Focus-and-Suppress attention that enhance the foreground semantics via motion-salient feature gating and suppress the background noise based on the temporal-aware motion context from IMM without explicit segmentation. Based on above two designs, FocusTrack enables end-to-end training with compact one-stage pipeline. Extensive experiments on prominent 3D tracking benchmarks, such as KITTI, nuScenes, and Waymo, demonstrate that the FocusTrack achieves new SOTA performance while running at a high speed with 105 FPS.