Differentiable Inverse Graphics for Zero-shot Scene Reconstruction and Robot Grasping

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to achieve zero-shot scene reconstruction and grasping of unseen objects in unknown environments. This work proposes a novel approach that requires neither training data nor test-time sampling, integrating neural priors with physically differentiable rendering to jointly estimate object geometry, material, illumination, and 6D pose from a single RGB-D image and bounding box via constrained optimization. To the best of our knowledge, this is the first method to enable physically consistent zero-shot reconstruction without relying on object-specific models or demonstrations. The approach outperforms state-of-the-art techniques on model-free, few-shot pose estimation benchmarks and successfully enables zero-shot robotic grasping, significantly enhancing generalization and interpretability in novel environments.

Technology Category

Application Category

📝 Abstract
Operating effectively in novel real-world environments requires robotic systems to estimate and interact with previously unseen objects. Current state-of-the-art models address this challenge by using large amounts of training data and test-time samples to build black-box scene representations. In this work, we introduce a differentiable neuro-graphics model that combines neural foundation models with physics-based differentiable rendering to perform zero-shot scene reconstruction and robot grasping without relying on any additional 3D data or test-time samples. Our model solves a series of constrained optimization problems to estimate physically consistent scene parameters, such as meshes, lighting conditions, material properties, and 6D poses of previously unseen objects from a single RGBD image and bounding boxes. We evaluated our approach on standard model-free few-shot benchmarks and demonstrated that it outperforms existing algorithms for model-free few-shot pose estimation. Furthermore, we validated the accuracy of our scene reconstructions by applying our algorithm to a zero-shot grasping task. By enabling zero-shot, physically-consistent scene reconstruction and grasping without reliance on extensive datasets or test-time sampling, our approach offers a pathway towards more data efficient, interpretable and generalizable robot autonomy in novel environments.
Problem

Research questions and friction points this paper is trying to address.

zero-shot scene reconstruction
robot grasping
unseen objects
data efficiency
novel environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

differentiable rendering
zero-shot scene reconstruction
neural graphics
robot grasping
physics-based optimization
🔎 Similar Papers
No similar papers found.
Octavio Arriaga
Octavio Arriaga
University of Bremen
Machine Learning
P
Proneet Sharma
Robotics Innovation Center, DFKI GmbH
J
Jichen Guo
M
Marc Otto
Robotics Innovation Center, DFKI GmbH
S
Siddhant Kadwe
R
Rebecca Adam
Robotics Innovation Center, DFKI GmbH