๐ค AI Summary
Existing methods for 3D open-world object classification suffer from poor robustness due to reliance on 2D projections, failing to handle arbitrary shapes, unconstrained poses, and unknown categories. This paper introduces the first training-free open-set classification framework leveraging geometric priors from 3D generative models. Specifically: (1) It pioneers the use of score-based or diffusion-based 3D generative models as geometric priors to guide zero-shot semantic alignment; (2) it designs a rotation-invariant feature extractor to achieve pose-agnostic representation learning; and (3) it constructs an end-to-end zero-shot inference architecture enabling recognition of unseen categories and robust classification under arbitrary viewpoints. Evaluated on ModelNet10 and McGill, the method achieves state-of-the-art accuracy, improving overall classification accuracy by 32.0% and 8.7%, respectively.
๐ Abstract
3D open-world classification is a challenging yet essential task in dynamic and unstructured real-world scenarios, requiring both open-category and open-pose recognition. To address these challenges, recent wisdom often takes sophisticated 2D pre-trained models to provide enriched and stable representations. However, these methods largely rely on how 3D objects can be projected into 2D space, which is unfortunately not well solved, and thus significantly limits their performance. Unlike these present efforts, in this paper we make a pioneering exploration of 3D generative models for 3D open-world classification. Drawing on abundant prior knowledge from 3D generative models, we additionally craft a rotation-invariant feature extractor. This innovative synergy endows our pipeline with the advantages of being training-free, open-category, and pose-invariant, thus well suited to 3D open-world classification. Extensive experiments on benchmark datasets demonstrate the potential of generative models in 3D open-world classification, achieving state-of-the-art performance on ModelNet10 and McGill with 32.0% and 8.7% overall accuracy improvement, respectively.