OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Open-vocabulary 3D object detection aims to localize objects from unseen categories in point clouds, yet existing methods suffer from poor cross-modal alignment robustness due to semantic inconsistency between 3D point cloud features and 2D image features. To address this, we propose a semantic consistency alignment mechanism: (1) high-quality 3D pseudo-labels are mined via self-supervised learning; (2) a dynamic alignment quality assessment module jointly filters noisy matches from multiple sources; and (3) the 3D detector is deeply integrated with a vision-language model to enable open-vocabulary classification and precise 3D localization. Evaluated on nuScenes, our method achieves state-of-the-art performance—improving recall for novel categories by +12.3% and 3D localization accuracy (AP) by +8.7%—demonstrating significant gains in both generalization and geometric precision.

Technology Category

Application Category

📝 Abstract

Open-vocabulary 3D object detection for autonomous driving aims to detect novel objects beyond the predefined training label sets in point cloud scenes. Existing approaches achieve this by connecting traditional 3D object detectors with vision-language models (VLMs) to regress 3D bounding boxes for novel objects and perform open-vocabulary classification through cross-modal alignment between 3D and 2D features. However, achieving robust cross-modal alignment remains a challenge due to semantic inconsistencies when generating corresponding 3D and 2D feature pairs. To overcome this challenge, we present OV-SCAN, an Open-Vocabulary 3D framework that enforces Semantically Consistent Alignment for Novel object discovery. OV-SCAN employs two core strategies: discovering precise 3D annotations and filtering out low-quality or corrupted alignment pairs (arising from 3D annotation, occlusion-induced, or resolution-induced noise). Extensive experiments on the nuScenes dataset demonstrate that OV-SCAN achieves state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Detects novel objects in 3D point clouds for autonomous driving.

Addresses semantic inconsistencies in cross-modal 3D-2D feature alignment.

Improves open-vocabulary classification by filtering low-quality alignment pairs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enforces semantically consistent cross-modal alignment

Discovers precise 3D annotations for novel objects

Filters low-quality alignment pairs to reduce noise

🔎 Similar Papers

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection