๐ค AI Summary
This work addresses the challenge of reconstructing large-scale, geometrically accurate, and interactive 3D urban scenes from a single satellite imageโwithout requiring any 3D annotations. The proposed end-to-end framework integrates satellite image structural parsing, curriculum-driven iterative optimization, cross-view consistency constraints, Gaussian splatting rendering, and open-domain diffusion models to synthesize detailed appearance under coarse geometric guidance. Key contributions include: (1) the first high-fidelity 3D urban modeling method at the city-block scale without 3D supervision; (2) a geometry-appearance co-optimization strategy that significantly improves structural integrity and texture realism; and (3) real-time, immersive scene navigation capability. Quantitative and qualitative evaluations demonstrate superior performance over state-of-the-art methods in geometric accuracy, multi-view consistency, and visual fidelity.
๐ Abstract
Synthesizing large-scale, explorable, and geometrically accurate 3D urban scenes is a challenging yet valuable task in providing immersive and embodied applications. The challenges lie in the lack of large-scale and high-quality real-world 3D scans for training generalizable generative models. In this paper, we take an alternative route to create large-scale 3D scenes by synergizing the readily available satellite imagery that supplies realistic coarse geometry and the open-domain diffusion model for creating high-quality close-up appearances. We propose extbf{Skyfall-GS}, the first city-block scale 3D scene creation framework without costly 3D annotations, also featuring real-time, immersive 3D exploration. We tailor a curriculum-driven iterative refinement strategy to progressively enhance geometric completeness and photorealistic textures. Extensive experiments demonstrate that Skyfall-GS provides improved cross-view consistent geometry and more realistic textures compared to state-of-the-art approaches. Project page: https://skyfall-gs.jayinnn.dev/