Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

📅 2024-10-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the limitation of conventional 3D multi-object tracking (MOT) methods—namely, their reliance on predefined categories—this paper introduces the novel task of open-vocabulary 3D MOT, enabling real-time 3D tracking and state estimation of objects from previously unseen categories in autonomous driving scenarios. Methodologically, we propose a new benchmark split for unknown categories, design a category-agnostic cross-modal feature alignment mechanism, and develop a temporal-aware graph neural network for trajectory association. Our framework integrates point-cloud encoding, text-guided vision-language alignment, and contrastive-learning-driven adaptive pseudo-labeling. Evaluated on multiple outdoor driving datasets, our approach robustly tracks over 15 unseen object categories, achieves a 23.6% improvement in mMOTA, substantially narrows the performance gap between known and unknown categories, and demonstrates superior generalization compared to existing 3D trackers.

Technology Category

Application Category

📝 Abstract

3D multi-object tracking plays a critical role in autonomous driving by enabling the real-time monitoring and prediction of multiple objects' movements. Traditional 3D tracking systems are typically constrained by predefined object categories, limiting their adaptability to novel, unseen objects in dynamic environments. To address this limitation, we introduce open-vocabulary 3D tracking, which extends the scope of 3D tracking to include objects beyond predefined categories. We formulate the problem of open-vocabulary 3D tracking and introduce dataset splits designed to represent various open-vocabulary scenarios. We propose a novel approach that integrates open-vocabulary capabilities into a 3D tracking framework, allowing for generalization to unseen object classes. Our method effectively reduces the performance gap between tracking known and novel objects through strategic adaptation. Experimental results demonstrate the robustness and adaptability of our method in diverse outdoor driving scenarios. To the best of our knowledge, this work is the first to address open-vocabulary 3D tracking, presenting a significant advancement for autonomous systems in real-world settings. Code, trained models, and dataset splits are available publicly.

Problem

Research questions and friction points this paper is trying to address.

Enhance 3D multi-object tracking adaptability

Extend tracking to unseen object categories

Reduce performance gap in novel objects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-vocabulary 3D tracking integration

Generalization to unseen object classes

Robust performance in diverse scenarios

🔎 Similar Papers

No similar papers found.