🤖 AI Summary
This work addresses the out-of-distribution (OOD) object detection problem in real-world point clouds, where 3D vision-language models (3D VLMs) suffer from “synthetic-to-real” domain shift due to reliance on synthetic data for pretraining. We propose a plug-and-play, training-free neighborhood propagation scoring method: a k-nearest-neighbor graph is constructed in the 3D VLM embedding space, incorporating a joint geometric-semantic distance metric; confidence propagation then quantifies the degradation of text-point cloud alignment to identify OOD samples. Crucially, this is the first approach to explicitly model alignment degradation as the core bottleneck of domain shift, establishing a novel neighborhood-propagation-based paradigm for OOD detection in 3D VLMs. Evaluated on multiple real-world point cloud benchmarks, our method achieves state-of-the-art performance—significantly outperforming existing approaches—and demonstrates strong cross-domain robustness.
📝 Abstract
As point cloud data increases in prevalence in a variety of applications, the ability to detect out-of-distribution (OOD) point cloud objects becomes critical for ensuring model safety and reliability. However, this problem remains under-explored in existing research. Inspired by success in the image domain, we propose to exploit advances in 3D vision-language models (3D VLMs) for OOD detection in point cloud objects. However, a major challenge is that point cloud datasets used to pre-train 3D VLMs are drastically smaller in size and object diversity than their image-based counterparts. Critically, they often contain exclusively computer-designed synthetic objects. This leads to a substantial domain shift when the model is transferred to practical tasks involving real objects scanned from the physical environment. In this paper, our empirical experiments show that synthetic-to-real domain shift significantly degrades the alignment of point cloud with their associated text embeddings in the 3D VLM latent space, hindering downstream performance. To address this, we propose a novel methodology called SODA which improves the detection of OOD point clouds through a neighborhood-based score propagation scheme. SODA is inference-based, requires no additional model training, and achieves state-of-the-art performance over existing approaches across datasets and problem settings.