CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians

๐Ÿ“… 2025-05-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address severe occlusion, geometric incompleteness, high memory overhead, and poor edge-deployment capability in large-scale urban aerial reconstruction, this paper proposes a hybrid representation framework combining proxy building meshes with residual 3D Gaussians. Our method innovatively integrates multi-view stereo (MVS)-derived proxy geometry with depth-guided residual Gaussians, augmented by importance-aware downsampling and joint optimization. We further incorporate zero-order spherical harmonic lighting, image reprojection constraints, and a mobile-GPU-oriented lightweight design. Evaluated on real-world aerial datasets, our approach achieves a 1.4ร— training speedup while significantly reducing GPU memory consumption and energy usage. Notably, it enables the first real-time rasterization-based rendering of complex urban scenes on consumer-grade mobile GPUsโ€”overcoming fundamental limitations of 3D Gaussian splatting in dense modeling fidelity, prolonged training duration, and on-device adaptability.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurate and efficient modeling of large-scale urban scenes is critical for applications such as AR navigation, UAV based inspection, and smart city digital twins. While aerial imagery offers broad coverage and complements limitations of ground-based data, reconstructing city-scale environments from such views remains challenging due to occlusions, incomplete geometry, and high memory demands. Recent advances like 3D Gaussian Splatting (3DGS) improve scalability and visual quality but remain limited by dense primitive usage, long training times, and poor suit ability for edge devices. We propose CityGo, a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for lightweight, photorealistic rendering of urban scenes from aerial perspectives. Our approach first extracts compact building proxy meshes from MVS point clouds, then uses zero order SH Gaussians to generate occlusion-free textures via image-based rendering and back-projection. To capture high-frequency details, we introduce residual Gaussians placed based on proxy-photo discrepancies and guided by depth priors. Broader urban context is represented by surrounding Gaussians, with importance-aware downsampling applied to non-critical regions to reduce redundancy. A tailored optimization strategy jointly refines proxy textures and Gaussian parameters, enabling real-time rendering of complex urban scenes on mobile GPUs with significantly reduced training and memory requirements. Extensive experiments on real-world aerial datasets demonstrate that our hybrid representation significantly reduces training time, achieving on average 1.4x speedup, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches. Furthermore, CityGo enables real-time rendering of large-scale urban scenes on mobile consumer GPUs, with substantially reduced memory usage and energy consumption.
Problem

Research questions and friction points this paper is trying to address.

Efficient modeling of large urban scenes for AR and UAV applications
Overcoming occlusion and memory issues in aerial view reconstruction
Reducing training time and resource use for mobile rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combining proxy geometry and Gaussians
Residual Gaussians for high-frequency detail capture
Importance-aware downsampling to reduce redundancy
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Weihang Liu
ShanghaiTech University, GGU Technology Co., Ltd
Y
Yuhui Zhong
DGene
Y
Yuke Li
ShanghaiTech University
X
Xi Chen
ShanghaiTech University
J
Jiadi Cui
ShanghaiTech University, Stereye
H
Honglong Zhang
Migu Cultural Technology Co., Ltd
L
Lan Xu
ShanghaiTech University
X
Xin Lou
ShanghaiTech University, GGU Technology Co., Ltd
Yujiao Shi
Yujiao Shi
ShanghaiTech University
3D Computer Vision
Jingyi Yu
Jingyi Yu
Professor, ShanghaiTech University
Computer VisionComputer Graphics
Yingliang Zhang
Yingliang Zhang
DGene
Neural RepresentationLight Field3D Reconstruction