Feed-Forward Gaussian Splatting from Sparse Aerial Views

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work addresses the challenges of ghosting artifacts, facade melting, and texture stretching commonly encountered in urban scene reconstruction from sparse aerial views. To this end, we propose AnyCity, a framework that fuses observation-supported geometric latent variables with scaffold-conditioned aerial completion tokens in a single forward pass. A gated residual mechanism selectively updates weakly constrained regions, while an observation-anchoring strategy explicitly disentangles observed geometry from prior-generated content. By integrating 3D Gaussian splatting, an aerial-adapted video diffusion prior, and an observation-preserving objective function, AnyCity significantly outperforms existing feed-forward methods on both synthetic and real-world scenes, achieving high-quality novel view synthesis at sub-second inference speeds.
📝 Abstract
Reconstructing large-scale urban scenes from sparse aerial views is a crucial yet challenging task. Due to biased top-down and shallow-oblique camera poses, sparse aerial captures exhibit strong evidence imbalance: roofs and open regions are repeatedly observed, while facades, distant buildings, and occluded structures receive little multi-view support. Existing feed-forward 3D Gaussian Splatting methods directly regress a deterministic representation from sparse inputs, but this often leads to ghosting, melted facades, and stretched textures. Recent pseudo-view and video-based generative reconstruction methods use additional supervision or generative priors. However, they often lack a clear separation between observed geometry and prior-driven content, which can lead to plausible but inconsistent structures. We propose AnyCity, an observation-grounded generative reconstruction framework for sparse aerial urban scenes. AnyCity first predicts an observation-supported geometry latent to anchor reliable structures, and then uses scaffold-conditioned aerial completion tokens to predict a gated residual update for weakly constrained content before Gaussian decoding. During training, dense-to-sparse distillation transfers structural cues from dense-view reconstruction, while an aerial-adapted video diffusion prior provides fine-grained urban appearance cues through gated token conditioning. Observation-preserving objectives keep the refined representation consistent with input-supported geometry. At inference time, AnyCity reconstructs the final 3D Gaussian scene from sparse aerial views in a single feed-forward pass, achieving coherent urban novel-view synthesis with second-level inference. Experiments on synthetic, aerial-domain, UAV-textured, and real-world scenes show consistent improvements over feed-forward baselines.
Problem

Research questions and friction points this paper is trying to address.

sparse aerial views
urban scene reconstruction
3D Gaussian Splatting
observation imbalance
generative reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
Sparse Aerial Reconstruction
Generative Prior
Observation-Grounded Geometry
Feed-Forward 3D Reconstruction
🔎 Similar Papers
No similar papers found.