π€ AI Summary
Embodied AI faces significant challenges in sim-to-real transfer due to fidelity gaps between synthetic simulations and real-world environments.
Method: This paper proposes a lightweight, cost-effective real-scene modeling framework leveraging iPhone-captured imagery and 3D Gaussian Splatting for high-fidelity, personalized scene reconstruction. The reconstructed scenes are integrated into Habitat-Sim, where navigation policies are fine-tuned jointly on mesh-based simulation and real-image-guided navigation tasks.
Contribution/Results: To our knowledge, this is the first work to establish a closed-loop βreal β simulation β realβ navigation adaptation pipeline. Compared to large-scale pre-trained baselines, our approach achieves absolute improvements of 20β40% in real-world navigation success rate, with simulation-to-reality behavioral correlation reaching 0.87β0.97. These results demonstrate substantially enhanced policy generalization and environmental adaptability.
π Abstract
The field of Embodied AI predominantly relies on simulation for training and evaluation, often using either fully synthetic environments that lack photorealism or high-fidelity real-world reconstructions captured with expensive hardware. As a result, sim-to-real transfer remains a major challenge. In this paper, we introduce EmbodiedSplat, a novel approach that personalizes policy training by efficiently capturing the deployment environment and fine-tuning policies within the reconstructed scenes. Our method leverages 3D Gaussian Splatting (GS) and the Habitat-Sim simulator to bridge the gap between realistic scene capture and effective training environments. Using iPhone-captured deployment scenes, we reconstruct meshes via GS, enabling training in settings that closely approximate real-world conditions. We conduct a comprehensive analysis of training strategies, pre-training datasets, and mesh reconstruction techniques, evaluating their impact on sim-to-real predictivity in real-world scenarios. Experimental results demonstrate that agents fine-tuned with EmbodiedSplat outperform both zero-shot baselines pre-trained on large-scale real-world datasets (HM3D) and synthetically generated datasets (HSSD), achieving absolute success rate improvements of 20% and 40% on real-world Image Navigation task. Moreover, our approach yields a high sim-vs-real correlation (0.87--0.97) for the reconstructed meshes, underscoring its effectiveness in adapting policies to diverse environments with minimal effort. Project page: https://gchhablani.github.io/embodied-splat