Delving into Dynamic Scene Cue-Consistency for Robust 3D Multi-Object Tracking

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of 3D multi-object tracking (MOT) in autonomous driving under crowded scenes and imperfect detection outputs, this paper proposes DSC-Track: an end-to-end online tracking framework leveraging dynamic scene cue consistency. The core innovations include a spatiotemporal Point-Pair Feature (PPF) encoder and a cue-consistency Transformer module, which explicitly model inter-frame geometric relationships to suppress interference from irrelevant objects and enable stable feature matching and trajectory association. By integrating point-pair feature encoding, Transformer-based feature alignment, and dynamic feature updating, DSC-Track significantly improves association reliability under high object density and low-quality detections. Evaluated on nuScenes and Waymo Open Dataset, it achieves state-of-the-art performance: AMOTA scores of 73.2% (validation) and 70.3% (test) on nuScenes—substantially outperforming existing methods.

Technology Category

Application Category

📝 Abstract
3D multi-object tracking is a critical and challenging task in the field of autonomous driving. A common paradigm relies on modeling individual object motion, e.g., Kalman filters, to predict trajectories. While effective in simple scenarios, this approach often struggles in crowded environments or with inaccurate detections, as it overlooks the rich geometric relationships between objects. This highlights the need to leverage spatial cues. However, existing geometry-aware methods can be susceptible to interference from irrelevant objects, leading to ambiguous features and incorrect associations. To address this, we propose focusing on cue-consistency: identifying and matching stable spatial patterns over time. We introduce the Dynamic Scene Cue-Consistency Tracker (DSC-Track) to implement this principle. Firstly, we design a unified spatiotemporal encoder using Point Pair Features (PPF) to learn discriminative trajectory embeddings while suppressing interference. Secondly, our cue-consistency transformer module explicitly aligns consistent feature representations between historical tracks and current detections. Finally, a dynamic update mechanism preserves salient spatiotemporal information for stable online tracking. Extensive experiments on the nuScenes and Waymo Open Datasets validate the effectiveness and robustness of our approach. On the nuScenes benchmark, for instance, our method achieves state-of-the-art performance, reaching 73.2% and 70.3% AMOTA on the validation and test sets, respectively.
Problem

Research questions and friction points this paper is trying to address.

Improving 3D multi-object tracking in crowded environments
Leveraging spatial cues for robust object association
Reducing interference from irrelevant geometric relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Point Pair Features for spatiotemporal encoding
Employs cue-consistency transformer for feature alignment
Dynamic update mechanism preserves spatiotemporal information
🔎 Similar Papers
No similar papers found.
H
Haonan Zhang
Zhejiang University
Xinyao Wang
Xinyao Wang
Amazon AGI
LLMRLMultimodal
B
Boxi Wu
Zhejiang University
T
Tu Zheng
Fabu
W
Wang Yunhua
ShanDong Land-Sea-Nexus Digital Technology
Z
Zheng Yang
Fabu