FreeZe: Training-Free Zero-Shot 6D Pose Estimation with Geometric and Vision Foundation Models

📅 2023-12-01

🏛️ European Conference on Computer Vision

📈 Citations: 2

✨ Influential: 1

🤖 AI Summary

We address the challenging problem of zero-shot 6D pose estimation for unseen objects—requiring no object-specific annotations or model fine-tuning. Our method introduces a training-agnostic, cross-modal inference framework that jointly leverages vision-language foundation models (CLIP) and differentiable Signed Distance Function (SDF)-based geometric representations. This enables an end-to-end differentiable pose optimization pipeline that synergistically integrates semantic and geometric cues. Crucially, our approach eliminates reliance on supervised training: given only a single RGB image and a CAD model, it directly estimates accurate 6D poses. On standard benchmarks including NOCS and OmniObject3D, our method achieves state-of-the-art zero-shot performance, operating at 20 FPS. It demonstrates significantly improved generalization to unseen categories, arbitrary viewpoints, and heavily occluded scenes—without any task-specific adaptation or retraining.

Problem

Research questions and friction points this paper is trying to address.

6D Pose Estimation

Unseen Objects

Limited Training Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

6D Pose Estimation

Unseen Object Recognition

Visual Feature Distinction

🔎 Similar Papers

No similar papers found.