Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Autonomous robots operating in unstructured real-world environments require cross-scene physical reasoning capabilities to enable zero-shot generalization in manipulation planning without retraining. To address this, we propose an end-to-end embodied physical reasoning framework that jointly integrates: (1) 3D Gaussian Splatting for scene reconstruction, (2) SAM-driven object segmentation, (3) LLaVA-guided material and semantic understanding, and (4) differentiable physics simulation (PhysX/Isaac Gym) for joint optimization. Our approach establishes the first unified multimodal model integrating geometry, semantics, material properties, and dynamics—enabling object-centric planning and physics-consistency verification. Evaluated on billiard-style manipulation and quadrotor landing tasks, it achieves sim-to-real zero-shot transfer: real-world success rates improve by 42%, and physics-consistent planning reaches 91.3%.

Technology Category

Application Category

📝 Abstract

Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.

Problem

Research questions and friction points this paper is trying to address.

Enabling robots to reason about physical action consequences

Integrating scene reconstruction, semantic understanding, and physics simulation

Achieving generalizable physical planning without relearning dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting for scene reconstruction

Vision-language models infer material properties

Physics simulation predicts action outcomes

🔎 Similar Papers

Open-Source, Cost-Aware Kinematically Feasible Planning for Mobile and Surface Robotics