CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address severe occlusion, geometric incompleteness, high memory overhead, and poor edge-deployment capability in large-scale urban aerial reconstruction, this paper proposes a hybrid representation framework combining proxy building meshes with residual 3D Gaussians. Our method innovatively integrates multi-view stereo (MVS)-derived proxy geometry with depth-guided residual Gaussians, augmented by importance-aware downsampling and joint optimization. We further incorporate zero-order spherical harmonic lighting, image reprojection constraints, and a mobile-GPU-oriented lightweight design. Evaluated on real-world aerial datasets, our approach achieves a 1.4× training speedup while significantly reducing GPU memory consumption and energy usage. Notably, it enables the first real-time rasterization-based rendering of complex urban scenes on consumer-grade mobile GPUs—overcoming fundamental limitations of 3D Gaussian splatting in dense modeling fidelity, prolonged training duration, and on-device adaptability.

Technology Category

Application Category

📝 Abstract

Accurate and efficient modeling of large-scale urban scenes is critical for applications such as AR navigation, UAV based inspection, and smart city digital twins. While aerial imagery offers broad coverage and complements limitations of ground-based data, reconstructing city-scale environments from such views remains challenging due to occlusions, incomplete geometry, and high memory demands. Recent advances like 3D Gaussian Splatting (3DGS) improve scalability and visual quality but remain limited by dense primitive usage, long training times, and poor suit ability for edge devices. We propose CityGo, a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for lightweight, photorealistic rendering of urban scenes from aerial perspectives. Our approach first extracts compact building proxy meshes from MVS point clouds, then uses zero order SH Gaussians to generate occlusion-free textures via image-based rendering and back-projection. To capture high-frequency details, we introduce residual Gaussians placed based on proxy-photo discrepancies and guided by depth priors. Broader urban context is represented by surrounding Gaussians, with importance-aware downsampling applied to non-critical regions to reduce redundancy. A tailored optimization strategy jointly refines proxy textures and Gaussian parameters, enabling real-time rendering of complex urban scenes on mobile GPUs with significantly reduced training and memory requirements. Extensive experiments on real-world aerial datasets demonstrate that our hybrid representation significantly reduces training time, achieving on average 1.4x speedup, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches. Furthermore, CityGo enables real-time rendering of large-scale urban scenes on mobile consumer GPUs, with substantially reduced memory usage and energy consumption.

Problem

Research questions and friction points this paper is trying to address.

Efficient modeling of large urban scenes for AR and UAV applications

Overcoming occlusion and memory issues in aerial view reconstruction

Reducing training time and resource use for mobile rendering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combining proxy geometry and Gaussians

Residual Gaussians for high-frequency detail capture

Importance-aware downsampling to reduce redundancy

🔎 Similar Papers

Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering