🤖 AI Summary
Existing novel view synthesis (NVS) methods rely on evaluation along training trajectories, failing to reflect the cross-lane extrapolation capability essential for autonomous driving closed-loop simulation. To address this gap, we introduce the first benchmark dataset specifically designed for cross-lane NVS evaluation, comprising six multi-temporal, multi-weather sequences—each with 450 training frames and 120 cross-lane test frames (lateral offsets of 1–4 m)—accompanied by precise pose annotations and calibrated camera intrinsics/extrinsics. We further propose the first standardized cross-lane NVS evaluation protocol, supporting both front-facing monocular and multi-camera configurations. Comprehensive experiments reveal that state-of-the-art NVS methods suffer an average PSNR degradation exceeding 8 dB under cross-lane conditions, clearly exposing their limited generalization beyond trained trajectories. This underscores the dataset’s critical role in enabling rigorous, safety-oriented assessment for driving simulators.
📝 Abstract
Comprehensive testing of autonomous systems through simulation is essential to ensure the safety of autonomous driving vehicles. This requires the generation of safety-critical scenarios that extend beyond the limitations of real-world data collection, as many of these scenarios are rare or rarely encountered on public roads. However, evaluating most existing novel view synthesis (NVS) methods relies on sporadic sampling of image frames from the training data, comparing the rendered images with ground-truth images. Unfortunately, this evaluation protocol falls short of meeting the actual requirements in closed-loop simulations. Specifically, the true application demands the capability to render novel views that extend beyond the original trajectory (such as cross-lane views), which are challenging to capture in the real world. To address this, this paper presents a synthetic dataset for novel driving view synthesis evaluation, which is specifically designed for autonomous driving simulations. This unique dataset includes testing images captured by deviating from the training trajectory by $1-4$ meters. It comprises six sequences that cover various times and weather conditions. Each sequence contains $450$ training images, $120$ testing images, and their corresponding camera poses and intrinsic parameters. Leveraging this novel dataset, we establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multicamera settings. The experimental findings underscore the significant gap in current approaches, revealing their inadequate ability to fulfill the demanding prerequisites of cross-lane or closed-loop simulation.