🤖 AI Summary
This work addresses the challenge of controllably generating out-of-distribution, safety-critical scenarios—such as collisions and pedestrian crossings—in autonomous driving simulation. The authors propose a novel approach that integrates trajectory editing, 3D point cloud reconstruction, and video diffusion modeling. By reconstructing dynamic traffic participants from multi-frame LiDAR point clouds and enhancing vehicle geometry with 360° completion, the method generates realistic driving scenes. A preference-based reinforcement learning fine-tuning strategy is introduced to significantly improve the visual fidelity and temporal coherence of synthesized videos without requiring ground-truth labels. Notably, this is the first work to incorporate preference-driven reward mechanisms into driving video generation, achieving state-of-the-art performance in both safety-critical scenario synthesis and novel ego-vehicle viewpoint rendering.
📝 Abstract
We present ReinDriveGen, a framework that enables full controllability over dynamic driving scenes, allowing users to freely edit actor trajectories to simulate safety-critical corner cases such as front-vehicle collisions, drifting cars, vehicles spinning out of control, pedestrians jaywalking, and cyclists cutting across lanes. Our approach constructs a dynamic 3D point cloud scene from multi-frame LiDAR data, introduces a vehicle completion module to reconstruct full 360° geometry from partial observations, and renders the edited scene into 2D condition images that guide a video diffusion model to synthesize realistic driving videos. Since such edited scenarios inevitably fall outside the training distribution, we further propose an RL-based post-training strategy with a pairwise preference model and a pairwise reward mechanism, enabling robust quality improvement under out-of-distribution conditions without ground-truth supervision. Extensive experiments demonstrate that ReinDriveGen outperforms existing approaches on edited driving scenarios and achieves state-of-the-art results on novel ego viewpoint synthesis.