GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024

📅 2024-12-02
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the model-free, zero-shot 2D detection of unknown objects in unconstrained scenes. Methodologically, it introduces the first framework integrating Gaussian splatting with vision foundation models (VFMs): geometric modeling is achieved via video-driven Gaussian reconstruction, while semantic feature distillation leverages VFMs such as SAM and CLIP to construct a joint geometric-semantic representation—eliminating reliance on CAD templates or predefined 3D models. The framework enables real-time, category-agnostic object localization from a single reference video alone. Evaluated on the BOP-H3 benchmark, it matches the performance of CAD-based methods; in the BOP Challenge 2024 model-agnostic 2D detection track, it achieves both the highest overall score and the fastest runtime—securing dual first-place awards. This constitutes the first empirical validation of model-free paradigms for practical, real-world 6D pose estimation.

Technology Category

Application Category

📝 Abstract
We present GFreeDet, an unseen object detection approach that leverages Gaussian splatting and vision Foundation models under model-free setting. Unlike existing methods that rely on predefined CAD templates, GFreeDet reconstructs objects directly from reference videos using Gaussian splatting, enabling robust detection of novel objects without prior 3D models. Evaluated on the BOP-H3 benchmark, GFreeDet achieves comparable performance to CAD-based methods, demonstrating the viability of model-free detection for mixed reality (MR) applications. Notably, GFreeDet won the best overall method and the best fast method awards in the model-free 2D detection track at BOP Challenge 2024.
Problem

Research questions and friction points this paper is trying to address.

Detects unseen objects without predefined CAD models
Uses Gaussian splatting and foundation models
Achieves robust performance in mixed reality applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian splatting for object reconstruction
Leverages vision Foundation models
Detects unseen objects without CAD templates
X
Xingyu Liu
Tsinghua University
Y
Yingyue Li
Tsinghua University
C
Chengxi Li
Tsinghua University
Gu Wang
Gu Wang
Tsinghua University
Vision in Robotics3D VisionPose Estimation
Chenyangguang Zhang
Chenyangguang Zhang
ETH Zürich
3D computer visionrobotic perception
Z
Ziqin Huang
Tsinghua University
X
Xiangyang Ji
Tsinghua University