Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work addresses the challenge of constructing high-fidelity digital twins from real-robot trajectories, where severe occlusions, noisy camera poses, and strong dynamic disturbances impede accurate modeling. We propose an end-to-end physics–vision co-optimization framework. Methodologically, we introduce the first hybrid scene representation integrating 3D Gaussian splatting (for appearance modeling) with explicit physical object meshes (for geometry and physics modeling). Our approach jointly optimizes geometry, appearance, robot pose, and rigid-body physical parameters—enabling unsupervised pose calibration and high-fidelity reconstruction. By unifying differentiable rendering with the differentiable MuJoCo physics engine, we achieve tight coupling between visual and physical optimization. Evaluated on the ALOHA 2 bimanual platform, our method achieves millimeter-accurate object mesh reconstruction, high-quality novel-view synthesis, and zero-shot pose calibration—significantly improving geometric fidelity, physical simulatability, and visual realism in real-to-simulation transfer.

Technology Category

Application Category

📝 Abstract

Creating accurate, physical simulations directly from real-world robot motion holds great value for safe, scalable, and affordable robot learning, yet remains exceptionally challenging. Real robot data suffers from occlusions, noisy camera poses, dynamic scene elements, which hinder the creation of geometrically accurate and photorealistic digital twins of unseen objects. We introduce a novel real-to-sim framework tackling all these challenges at once. Our key insight is a hybrid scene representation merging the photorealistic rendering of 3D Gaussian Splatting with explicit object meshes suitable for physics simulation within a single representation. We propose an end-to-end optimization pipeline that leverages differentiable rendering and differentiable physics within MuJoCo to jointly refine all scene components - from object geometry and appearance to robot poses and physical parameters - directly from raw and imprecise robot trajectories. This unified optimization allows us to simultaneously achieve high-fidelity object mesh reconstruction, generate photorealistic novel views, and perform annotation-free robot pose calibration. We demonstrate the effectiveness of our approach both in simulation and on challenging real-world sequences using an ALOHA 2 bi-manual manipulator, enabling more practical and robust real-to-simulation pipelines.

Problem

Research questions and friction points this paper is trying to address.

Creating accurate simulations from imperfect robot data

Handling occlusions, noise, dynamics in real-world scenes

Unifying rendering and physics for digital twin creation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid scene representation combining 3D Gaussian Splatting and meshes

End-to-end optimization with differentiable rendering and physics

Unified refinement of geometry, appearance, poses, and parameters

🔎 Similar Papers

RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning