Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitation of conventional 3D multi-object tracking (MOT) methods—namely, their reliance on predefined categories—this paper introduces the novel task of open-vocabulary 3D MOT, enabling real-time 3D tracking and state estimation of objects from previously unseen categories in autonomous driving scenarios. Methodologically, we propose a new benchmark split for unknown categories, design a category-agnostic cross-modal feature alignment mechanism, and develop a temporal-aware graph neural network for trajectory association. Our framework integrates point-cloud encoding, text-guided vision-language alignment, and contrastive-learning-driven adaptive pseudo-labeling. Evaluated on multiple outdoor driving datasets, our approach robustly tracks over 15 unseen object categories, achieves a 23.6% improvement in mMOTA, substantially narrows the performance gap between known and unknown categories, and demonstrates superior generalization compared to existing 3D trackers.

Technology Category

Application Category

📝 Abstract
3D multi-object tracking plays a critical role in autonomous driving by enabling the real-time monitoring and prediction of multiple objects' movements. Traditional 3D tracking systems are typically constrained by predefined object categories, limiting their adaptability to novel, unseen objects in dynamic environments. To address this limitation, we introduce open-vocabulary 3D tracking, which extends the scope of 3D tracking to include objects beyond predefined categories. We formulate the problem of open-vocabulary 3D tracking and introduce dataset splits designed to represent various open-vocabulary scenarios. We propose a novel approach that integrates open-vocabulary capabilities into a 3D tracking framework, allowing for generalization to unseen object classes. Our method effectively reduces the performance gap between tracking known and novel objects through strategic adaptation. Experimental results demonstrate the robustness and adaptability of our method in diverse outdoor driving scenarios. To the best of our knowledge, this work is the first to address open-vocabulary 3D tracking, presenting a significant advancement for autonomous systems in real-world settings. Code, trained models, and dataset splits are available publicly.
Problem

Research questions and friction points this paper is trying to address.

Enhance 3D multi-object tracking adaptability
Extend tracking to unseen object categories
Reduce performance gap in novel objects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-vocabulary 3D tracking integration
Generalization to unseen object classes
Robust performance in diverse scenarios
🔎 Similar Papers
No similar papers found.
A
Ayesha Ishaq
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
M
Mohamed El Amine Boudjoghra
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Jean Lahoud
Jean Lahoud
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Computer Vision
F
F. Khan
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Linköping University
S
Salman H. Khan
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Australian National University
Hisham Cholakkal
Hisham Cholakkal
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Computer VisionLarge Multimodal ModelsLLMHealthcare Foundation ModelConversational Assistant
R
R. Anwer
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Aalto University