FreeZe: Training-Free Zero-Shot 6D Pose Estimation with Geometric and Vision Foundation Models

📅 2023-12-01
🏛️ European Conference on Computer Vision
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
We address the challenging problem of zero-shot 6D pose estimation for unseen objects—requiring no object-specific annotations or model fine-tuning. Our method introduces a training-agnostic, cross-modal inference framework that jointly leverages vision-language foundation models (CLIP) and differentiable Signed Distance Function (SDF)-based geometric representations. This enables an end-to-end differentiable pose optimization pipeline that synergistically integrates semantic and geometric cues. Crucially, our approach eliminates reliance on supervised training: given only a single RGB image and a CAD model, it directly estimates accurate 6D poses. On standard benchmarks including NOCS and OmniObject3D, our method achieves state-of-the-art zero-shot performance, operating at 20 FPS. It demonstrates significantly improved generalization to unseen categories, arbitrary viewpoints, and heavily occluded scenes—without any task-specific adaptation or retraining.
Problem

Research questions and friction points this paper is trying to address.

6D Pose Estimation
Unseen Objects
Limited Training Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

6D Pose Estimation
Unseen Object Recognition
Visual Feature Distinction
🔎 Similar Papers
No similar papers found.
A
Andrea Caraffa
Fondazione Bruno Kessler, Trento, Italy
Davide Boscaini
Davide Boscaini
Fondazione Bruno Kessler
Geometric Deep LearningComputer Vision
A
Amir Hamza
Fondazione Bruno Kessler, Trento, Italy; University of Trento, Italy
Fabio Poiesi
Fabio Poiesi
Fondazione Bruno Kessler
Computer Vision