UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes

📅 2024-05-24
🏛️ Neural Information Processing Systems
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing unsupervised 3D detection methods suffer from severe underestimation of static objects due to their reliance on dynamic object modeling, limiting overall performance. This paper introduces the first end-to-end unsupervised framework capable of jointly detecting both static and dynamic objects in a single training pass. Our approach addresses the problem by: (1) introducing an appearance-similarity-based pseudo-class mechanism to unify static and dynamic object modeling; (2) pioneering the extension of the 3D object discovery paradigm to detection—eliminating the need for iterative self-training; and (3) integrating spatial clustering, self-supervised scene flow estimation, LiDAR appearance encoding, and pseudo-class supervised learning. Evaluated on nuScenes, our unsupervised 3D object discovery achieves a mean average precision of 39.5, surpassing the state-of-the-art by over 100%.

Technology Category

Application Category

📝 Abstract
Unsupervised 3D object detection methods have emerged to leverage vast amounts of data without requiring manual labels for training. Recent approaches rely on dynamic objects for learning to detect mobile objects but penalize the detections of static instances during training. Multiple rounds of self-training are used to add detected static instances to the set of training targets; this procedure to improve performance is computationally expensive. To address this, we propose the method UNION. We use spatial clustering and self-supervised scene flow to obtain a set of static and dynamic object proposals from LiDAR. Subsequently, object proposals' visual appearances are encoded to distinguish static objects in the foreground and background by selecting static instances that are visually similar to dynamic objects. As a result, static and dynamic mobile objects are obtained together, and existing detectors can be trained with a single training. In addition, we extend 3D object discovery to detection by using object appearance-based cluster labels as pseudo-class labels for training object classification. We conduct extensive experiments on the nuScenes dataset and increase the state-of-the-art performance for unsupervised 3D object discovery, i.e. UNION more than doubles the average precision to 39.5. The code is available at github.com/TedLentsch/UNION.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised 3D object detection without manual labels
Improving static object detection in dynamic environments
Using visual appearance for object classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial clustering and self-supervised scene flow
Visual appearance encoding for static objects
Object appearance-based pseudo-class labels
🔎 Similar Papers
No similar papers found.