🤖 AI Summary
This work addresses the model-free, zero-shot 2D detection of unknown objects in unconstrained scenes. Methodologically, it introduces the first framework integrating Gaussian splatting with vision foundation models (VFMs): geometric modeling is achieved via video-driven Gaussian reconstruction, while semantic feature distillation leverages VFMs such as SAM and CLIP to construct a joint geometric-semantic representation—eliminating reliance on CAD templates or predefined 3D models. The framework enables real-time, category-agnostic object localization from a single reference video alone. Evaluated on the BOP-H3 benchmark, it matches the performance of CAD-based methods; in the BOP Challenge 2024 model-agnostic 2D detection track, it achieves both the highest overall score and the fastest runtime—securing dual first-place awards. This constitutes the first empirical validation of model-free paradigms for practical, real-world 6D pose estimation.
📝 Abstract
We present GFreeDet, an unseen object detection approach that leverages Gaussian splatting and vision Foundation models under model-free setting. Unlike existing methods that rely on predefined CAD templates, GFreeDet reconstructs objects directly from reference videos using Gaussian splatting, enabling robust detection of novel objects without prior 3D models. Evaluated on the BOP-H3 benchmark, GFreeDet achieves comparable performance to CAD-based methods, demonstrating the viability of model-free detection for mixed reality (MR) applications. Notably, GFreeDet won the best overall method and the best fast method awards in the model-free 2D detection track at BOP Challenge 2024.