đ¤ AI Summary
Embodied AI evaluationâparticularly for visual navigationâin open urban environments suffers from limited reproducibility due to insufficient simulation fidelity: existing methods fail to simultaneously achieve high-fidelity sensor rendering and geometrically accurate interactions, while emerging paradigms (e.g., video-to-3D Gaussian Splatting) still exhibit substantial visualâgeometric realism gaps. Method: We propose the first real-to-sim framework integrating multi-sensor data acquisition, collaborative NeRF and 3D Gaussian Splatting (3DGS) reconstruction, and geometry-constrained novel view synthesis to enable high-precision geometric modeling and photorealistic perception simulation for complex indoorâoutdoor urban scenes. Contribution/Results: We construct a diverse urban-scene dataset and empirically demonstrate that geometric accuracy critically determines novel view synthesis quality and navigation policy generalizability. Our framework substantially narrows the sim2real gap, enabling joint benchmarking of navigation, view synthesis, and 3D reconstructionâthereby enhancing evaluation credibility and reproducibility.
đ Abstract
Reproducible closed-loop evaluation remains a major bottleneck in Embodied AI such as visual navigation. A promising path forward is high-fidelity simulation that combines photorealistic sensor rendering with geometrically grounded interaction in complex, open-world urban environments. Although recent video-3DGS methods ease open-world scene capturing, they are still unsuitable for benchmarking due to large visual and geometric sim-to-real gaps. To address these challenges, we introduce Wanderland, a real-to-sim framework that features multi-sensor capture, reliable reconstruction, accurate geometry, and robust view synthesis. Using this pipeline, we curate a diverse dataset of indoor-outdoor urban scenes and systematically demonstrate how image-only pipelines scale poorly, how geometry quality impacts novel view synthesis, and how all of these adversely affect navigation policy learning and evaluation reliability. Beyond serving as a trusted testbed for embodied navigation, Wanderland's rich raw sensor data further allows benchmarking of 3D reconstruction and novel view synthesis models. Our work establishes a new foundation for reproducible research in open-world embodied AI. Project website is at https://ai4ce.github.io/wanderland/.