One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering

📅 2024-11-29
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to jointly model geometry, appearance, and physical properties for building high-fidelity, simulatable, and renderable world models from a single real-world robotic interaction sequence—comprising both visual and tactile observations—in novel environments. Method: We propose the first end-to-end jointly optimized framework: a differentiable point-based geometric representation captures scene structure; a voxelized appearance field enables photorealistic rendering; and differentiable collision detection coupled with physics simulation ensures dynamical consistency. Contribution/Results: Our approach achieves the first rigid-body unified representation and co-optimization of geometry, appearance, and physics. Experiments demonstrate that a single real interaction suffices to reconstruct a world model capable of both forward simulation and real-time rendering. The resulting model significantly outperforms state-of-the-art single-modality methods in fidelity and cross-environment generalization.

Technology Category

Application Category

📝 Abstract
Identifying predictive world models for robots in novel environments from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable programming to identify world models are incapable of jointly optimizing the geometry, appearance, and physical properties of the scene. In this work, we introduce a novel rigid object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based geometry representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models, given the sparse visual and tactile observations of a physical motion sequence. Through a series of world model identification tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready world models from only one robot action sequence.
Problem

Research questions and friction points this paper is trying to address.

Identify predictive world models for robots in novel environments
Jointly optimize geometry, appearance, and physical properties of scenes
Learn simulation- and rendering-ready world models from sparse observations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable point-based geometry representation
Grid-based appearance field for rendering
End-to-end optimization with physical simulator
🔎 Similar Papers
No similar papers found.
Yifan Zhu
Yifan Zhu
Beijing University of Posts and Telecommunications
PEFT of LLMsGraph RAGGraph mining
Tianyi Xiang
Tianyi Xiang
PhD, City University of Hong Kong
Computer VisionMachine Learning
A
Aaron Dollar
Department of Mechanical Engineering and Materials Science, Yale University, New Haven, United States
Z
Zherong Pan
Lightspeed Studios, Tencent America, Bellevue, WA, United States