🤖 AI Summary
This work addresses the challenge of high-quality novel view synthesis in large-scale outdoor scenes, where sparse ground-level imagery leads to insufficient camera coverage. The authors propose a feed-forward method that, for the first time, incorporates orthorectified satellite imagery as a global geometric prior, fusing it with GPS-tagged ground images within a unified geospatial coordinate system. By aligning cross-view features, the method predicts a per-pixel 3D Gaussian splatting representation. This approach substantially improves both scene coverage and rendering quality. Furthermore, the study introduces the first benchmark for georeferenced image-based novel view synthesis, demonstrating superior performance over state-of-the-art methods in terms of reconstruction completeness and visual fidelity.
📝 Abstract
We present Cross-View Splatter, a feed-forward method that predicts pixel-aligned Gaussian splats for outdoor scenes captured at ground level AND by satellite. Faithful reconstructions require good camera coverage, but ground imagery is time-consuming and hard to capture at scale for large outdoor scenes. Fortunately, satellite imagery can provide a global geometric prior that is easy to access via public APIs. Cross-View Splatter fuses orthorectified satellite views with GPS-tagged ground photos to predict Gaussian splats in a unified 3D coordinate frame. By aligning ground and bird's-eye feature representations, our model improves scene coverage and novel-view synthesis, compared to ground imagery alone. We train on curated georeferenced datasets and paired satellite-terrain data, mined from open mapping services. We evaluate our method on a new benchmark for novel-view synthesis with georeferenced imagery allowing comparison to prior state-of-the-art methods. Our code and data preparation will be available at https://nianticspatial.github.io/cross-view-splatter/.