OmniShape: Zero-Shot Multi-Hypothesis Shape and Pose Estimation in the Real World

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This paper addresses the problem of joint zero-shot 6D pose and complete 3D shape estimation from a single image, without relying on predefined 3D models or category-level priors. To this end, we propose OmniShape—a novel framework that for the first time decouples shape completion into measurement-projected distribution modeling and geometric prior distribution modeling, realized via a dual-conditional diffusion model that jointly samples multimodal pose–shape hypotheses in a unified generative process. Geometric representation is achieved using triplane neural fields, while perspective projection is modeled within a normalized reference frame to enforce geometric consistency. OmniShape is the first end-to-end method supporting zero-shot, multi-hypothesis inference. Extensive experiments on multiple real-world datasets demonstrate significant improvements over state-of-the-art approaches, yielding diverse, physically plausible, and geometrically consistent pose and complete shape predictions.

Technology Category

Application Category

📝 Abstract

We would like to estimate the pose and full shape of an object from a single observation, without assuming known 3D model or category. In this work, we propose OmniShape, the first method of its kind to enable probabilistic pose and shape estimation. OmniShape is based on the key insight that shape completion can be decoupled into two multi-modal distributions: one capturing how measurements project into a normalized object reference frame defined by the dataset and the other modelling a prior over object geometries represented as triplanar neural fields. By training separate conditional diffusion models for these two distributions, we enable sampling multiple hypotheses from the joint pose and shape distribution. OmniShape demonstrates compelling performance on challenging real world datasets. Project website: https://tri-ml.github.io/omnishape

Problem

Research questions and friction points this paper is trying to address.

Estimate object pose and shape from single observation

Decouple shape completion into multi-modal distributions

Sample multiple hypotheses for joint pose and shape

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples shape completion into multi-modal distributions

Uses conditional diffusion models for sampling

Represents object geometries as triplanar neural fields

🔎 Similar Papers

No similar papers found.