FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking

📅 2025-10-27
🏛️ ACM Multimedia
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing two-stage 3D point cloud object tracking methods, which rely on explicit foreground segmentation and consequently suffer from error accumulation and computational bottlenecks. To overcome these issues, we propose the first end-to-end single-stage tracking framework that jointly models motion and semantics without explicit segmentation, enabling both efficiency and accuracy. The core innovation lies in a focus-suppression attention mechanism, integrated with a temporal difference Siamese encoder to model inter-frame motion dynamics, thereby adaptively enhancing foreground features while suppressing background noise. Extensive experiments demonstrate that our method achieves state-of-the-art performance on major benchmarks—including KITTI, nuScenes, and Waymo—while running at an impressive inference speed of 105 FPS.

Technology Category

Application Category

📝 Abstract
In 3D point cloud object tracking, the motion-centric methods have emerged as a promising avenue due to its superior performance in modeling inter-frame motion. However, existing two-stage motion-based approaches suffer from fundamental limitations: (1) error accumulation due to decoupled optimization caused by explicit foreground segmentation prior to motion estimation, and (2) computational bottlenecks from sequential processing. To address these challenges, we propose FocusTrack, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention. The IMM module employs a temp-oral-difference siamese encoder to capture global motion patterns between adjacent frames. The Focus-and-Suppress attention that enhance the foreground semantics via motion-salient feature gating and suppress the background noise based on the temporal-aware motion context from IMM without explicit segmentation. Based on above two designs, FocusTrack enables end-to-end training with compact one-stage pipeline. Extensive experiments on prominent 3D tracking benchmarks, such as KITTI, nuScenes, and Waymo, demonstrate that the FocusTrack achieves new SOTA performance while running at a high speed with 105 FPS.
Problem

Research questions and friction points this paper is trying to address.

3D point cloud object tracking
motion-centric methods
error accumulation
computational bottleneck
two-stage approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

one-stage tracking
Inter-frame Motion Modeling
Focus-and-Suppress Attention
3D point cloud tracking
motion-semantics co-modeling
🔎 Similar Papers
No similar papers found.
Sifan Zhou
Sifan Zhou
Southeast University
RoboticsM/LLMsSpatial AIQuantization
Jiahao Nie
Jiahao Nie
Hangzhou Dianzi University
Computer VisionAutonomous DrivingObject Tracking
Ziyu Zhao
Ziyu Zhao
University of South Carolina
computer vision. 2D/3D segmentationGenerative 3D reconstruction
Y
Yichao Cao
Big Data Institute, Central South University, Changsha, China
X
Xiaobo Lu
School of Automation, Southeast University, Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing, China