One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering

📅 2024-11-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing methods struggle to jointly model geometry, appearance, and physical properties for building high-fidelity, simulatable, and renderable world models from a single real-world robotic interaction sequence—comprising both visual and tactile observations—in novel environments. Method: We propose the first end-to-end jointly optimized framework: a differentiable point-based geometric representation captures scene structure; a voxelized appearance field enables photorealistic rendering; and differentiable collision detection coupled with physics simulation ensures dynamical consistency. Contribution/Results: Our approach achieves the first rigid-body unified representation and co-optimization of geometry, appearance, and physics. Experiments demonstrate that a single real interaction suffices to reconstruct a world model capable of both forward simulation and real-time rendering. The resulting model significantly outperforms state-of-the-art single-modality methods in fidelity and cross-environment generalization.

Technology Category

Application Category

📝 Abstract

Identifying predictive world models for robots in novel environments from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable programming to identify world models are incapable of jointly optimizing the geometry, appearance, and physical properties of the scene. In this work, we introduce a novel rigid object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based geometry representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models, given the sparse visual and tactile observations of a physical motion sequence. Through a series of world model identification tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready world models from only one robot action sequence.

Problem

Research questions and friction points this paper is trying to address.

Identify predictive world models for robots in novel environments

Jointly optimize geometry, appearance, and physical properties of scenes

Learn simulation- and rendering-ready world models from sparse observations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable point-based geometry representation

Grid-based appearance field for rendering

End-to-end optimization with physical simulator

🔎 Similar Papers

No similar papers found.

Field AI

Irvine, CA

Research Scientist, Sensor and Systems Robotics (PhD)