🤖 AI Summary
To address the high cost and low efficiency of onboard data collection, this paper proposes a novel high-fidelity view synthesis paradigm bridging the infrastructure view to the vehicle view. We introduce the first Gaussian splatting–based cross-view generation framework, integrating adaptive depth warping, cascaded image inpainting, and diffusion modeling, augmented by a cross-view confidence-guided optimization mechanism to ensure multi-view consistency and joint geometric-appearance fidelity. To support training and evaluation, we construct RoadSight—the first real-world, multimodal, multi-view dataset for road场景 understanding. Experiments demonstrate that our method outperforms StreetGaussian by 45.7%, 34.2%, and 14.9% on NTA-IoU, NTL-IoU, and FID, respectively, significantly enhancing synthetic data quality and downstream task utility.
📝 Abstract
Vast and high-quality data are essential for end-to-end autonomous driving systems. However, current driving data is mainly collected by vehicles, which is expensive and inefficient. A potential solution lies in synthesizing data from real-world images. Recent advancements in 3D reconstruction demonstrate photorealistic novel view synthesis, highlighting the potential of generating driving data from images captured on the road. This paper introduces a novel method, I2V-GS, to transfer the Infrastructure view To the Vehicle view with Gaussian Splatting. Reconstruction from sparse infrastructure viewpoints and rendering under large view transformations is a challenging problem. We adopt the adaptive depth warp to generate dense training views. To further expand the range of views, we employ a cascade strategy to inpaint warped images, which also ensures inpainting content is consistent across views. To further ensure the reliability of the diffusion model, we utilize the cross-view information to perform a confidenceguided optimization. Moreover, we introduce RoadSight, a multi-modality, multi-view dataset from real scenarios in infrastructure views. To our knowledge, I2V-GS is the first framework to generate autonomous driving datasets with infrastructure-vehicle view transformation. Experimental results demonstrate that I2V-GS significantly improves synthesis quality under vehicle view, outperforming StreetGaussian in NTA-Iou, NTL-Iou, and FID by 45.7%, 34.2%, and 14.9%, respectively.