🤖 AI Summary
To address geometric-appearance visibility inconsistency, poor rendering quality, and limited scalability when synthesizing novel views (NVS) from large-scale 3D point cloud maps in autonomous driving scenarios, this paper proposes a dynamic point cloud selection method based on an appearance-geometry connectivity graph. We first construct a multimodal (camera + LiDAR)–driven connectivity graph to guide viewpoint-aware subset selection. Then, we design a joint adversarial learning and point rasterization optimization strategy, seamlessly integrated into the 3D Gaussian Splatting framework. Our method significantly improves rendering fidelity: it achieves state-of-the-art PSNR/SSIM on large-scale autonomous driving benchmarks, accelerates inference by 3.2×, reduces memory consumption by 67%, and enables real-time, high-fidelity NVS.
📝 Abstract
Current point-based approaches encounter limitations in scalability and rendering quality when using large 3D point cloud maps because using them directly for novel view synthesis (NVS) leads to degraded visualizations. We identify the primary issue behind these low-quality renderings as a visibility mismatch between geometry and appearance, stemming from using these two modalities together. To address this problem, we present CE-NPBG, a new approach for novel view synthesis (NVS) in large-scale autonomous driving scenes. Our method is a neural point-based technique that leverages two modalities: posed images (cameras) and synchronized raw 3D point clouds (LiDAR). We first employ a connectivity relationship graph between appearance and geometry, which retrieves points from a large 3D point cloud map observed from the current camera perspective and uses them for rendering. By leveraging this connectivity, our method significantly improves rendering quality and enhances run-time and scalability by using only a small subset of points from the large 3D point cloud map. Our approach associates neural descriptors with the points and uses them to synthesize views. To enhance the encoding of these descriptors and elevate rendering quality, we propose a joint adversarial and point rasterization training. During training, we pair an image-synthesizer network with a multi-resolution discriminator. At inference, we decouple them and use the image-synthesizer to generate novel views. We also integrate our proposal into the recent 3D Gaussian Splatting work to highlight its benefits for improved rendering and scalability.