ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

To address the limitations in high-fidelity single-view street-scene reconstruction and cross-scene generalization for autonomous driving, this paper proposes a sparse LiDAR-guided multimodal Gaussian neural rendering framework. Our method is the first to embed sparse LiDAR depth measurements directly into a differentiable Gaussian rasterization pipeline, establishing a joint image–LiDAR optimization objective. We further introduce a multimodal feature matching mechanism and a multi-scale Gaussian decoder to enable geometric–appearance co-modeling. The approach significantly improves the accuracy of Gaussian ellipsoid prediction and achieves state-of-the-art rendering quality (PSNR/SSIM) on Waymo and KITTI benchmarks. Moreover, it demonstrates strong zero-shot generalization to unseen scenes in novel-view synthesis. This work establishes a new paradigm for lightweight, robust single-view street-scene reconstruction.

Technology Category

Application Category

📝 Abstract

We present a novel approach, termed ADGaussian, for generalizable street scene reconstruction. The proposed method enables high-quality rendering from single-view input. Unlike prior Gaussian Splatting methods that primarily focus on geometry refinement, we emphasize the importance of joint optimization of image and depth features for accurate Gaussian prediction. To this end, we first incorporate sparse LiDAR depth as an additional input modality, formulating the Gaussian prediction process as a joint learning framework of visual information and geometric clue. Furthermore, we propose a multi-modal feature matching strategy coupled with a multi-scale Gaussian decoding model to enhance the joint refinement of multi-modal features, thereby enabling efficient multi-modal Gaussian learning. Extensive experiments on two large-scale autonomous driving datasets, Waymo and KITTI, demonstrate that our ADGaussian achieves state-of-the-art performance and exhibits superior zero-shot generalization capabilities in novel-view shifting.

Problem

Research questions and friction points this paper is trying to address.

Generalizable street scene reconstruction for autonomous driving

Joint optimization of image and depth features for accurate Gaussian prediction

Enhancing multi-modal Gaussian learning with LiDAR and visual inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LiDAR depth for joint feature optimization

Multi-modal feature matching enhances Gaussian learning

Multi-scale decoding improves novel-view generalization

🔎 Similar Papers

GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving