TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-frame query denoising methods struggle to model temporal correlations, while fixed attention masks impede information exchange between detection and tracking queries. To address these limitations, we propose Temporal Query Denoising (TQD), the first framework extending query denoising to the cross-frame temporal dimension—enabling denoised queries to explicitly encode motion-consistent instance features and temporal dynamics. We introduce a tracking-aware association mask to ensure training-inference consistency and support modeling of diverse realistic noise patterns. Built upon the DETR architecture, TQD integrates dynamic noise injection, explicit data association adaptation, and end-to-end training. Evaluated on nuScenes, TQD consistently boosts performance across various 3D multi-object trackers, delivering particularly substantial gains for tracking-by-detection approaches. Crucially, it requires only modifications to the training pipeline—no architectural changes or inference-time overhead—yet yields stable, significant improvements.

Technology Category

Application Category

📝 Abstract
Query denoising has become a standard training strategy for DETR-based detectors by addressing the slow convergence issue. Besides that, query denoising can be used to increase the diversity of training samples for modeling complex scenarios which is critical for Multi-Object Tracking (MOT), showing its potential in MOT application. Existing approaches integrate query denoising within the tracking-by-attention paradigm. However, as the denoising process only happens within the single frame, it cannot benefit the tracker to learn temporal-related information. In addition, the attention mask in query denoising prevents information exchange between denoising and object queries, limiting its potential in improving association using self-attention. To address these issues, we propose TQD-Track, which introduces Temporal Query Denoising (TQD) tailored for MOT, enabling denoising queries to carry temporal information and instance-specific feature representation. We introduce diverse noise types onto denoising queries that simulate real-world challenges in MOT. We analyze our proposed TQD for different tracking paradigms, and find out the paradigm with explicit learned data association module, e.g. tracking-by-detection or alternating detection and association, benefit from TQD by a larger margin. For these paradigms, we further design an association mask in the association module to ensure the consistent interaction between track and detection queries as during inference. Extensive experiments on the nuScenes dataset demonstrate that our approach consistently enhances different tracking methods by only changing the training process, especially the paradigms with explicit association module.
Problem

Research questions and friction points this paper is trying to address.

Enhances 3D tracking by enabling temporal query denoising
Improves association using self-attention via diverse noise types
Boosts tracking methods with explicit association modules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Temporal Query Denoising for MOT
Diverse noise types simulate real-world challenges
Association mask ensures consistent query interaction
🔎 Similar Papers
No similar papers found.