Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous robots operating in unstructured real-world environments require cross-scene physical reasoning capabilities to enable zero-shot generalization in manipulation planning without retraining. To address this, we propose an end-to-end embodied physical reasoning framework that jointly integrates: (1) 3D Gaussian Splatting for scene reconstruction, (2) SAM-driven object segmentation, (3) LLaVA-guided material and semantic understanding, and (4) differentiable physics simulation (PhysX/Isaac Gym) for joint optimization. Our approach establishes the first unified multimodal model integrating geometry, semantics, material properties, and dynamics—enabling object-centric planning and physics-consistency verification. Evaluated on billiard-style manipulation and quadrotor landing tasks, it achieves sim-to-real zero-shot transfer: real-world success rates improve by 42%, and physics-consistent planning reaches 91.3%.

Technology Category

Application Category

📝 Abstract
Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.
Problem

Research questions and friction points this paper is trying to address.

Enabling robots to reason about physical action consequences
Integrating scene reconstruction, semantic understanding, and physics simulation
Achieving generalizable physical planning without relearning dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting for scene reconstruction
Vision-language models infer material properties
Physics simulation predicts action outcomes