Glad: A Streaming Scene Generator for Autonomous Driving

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in autonomous driving simulation—including difficulty in generating edge-case scenarios, weak temporal coherence, and limited video duration—this paper proposes Glad, an end-to-end streaming video generation framework. Methodologically, Glad introduces a novel latent propagation mechanism that injects the latent feature of the preceding frame as a noise prior into the current frame; designs a streaming data sampler to enable continuous, temporally ordered clip sampling; and integrates diffusion-based modeling with latent-space temporal conditioning, conditional guidance, and streaming-aware training. Evaluated on nuScenes, Glad achieves significant improvements in temporal consistency and scene diversity of generated driving videos, establishing a new strong baseline for online autonomous driving video synthesis. The code and pretrained models will be publicly released.

Technology Category

Application Category

📝 Abstract
The generation and simulation of diverse real-world scenes have significant application value in the field of autonomous driving, especially for the corner cases. Recently, researchers have explored employing neural radiance fields or diffusion models to generate novel views or synthetic data under driving scenes. However, these approaches suffer from unseen scenes or restricted video length, thus lacking sufficient adaptability for data generation and simulation. To address these issues, we propose a simple yet effective framework, named Glad, to generate video data in a frame-by-frame style. To ensure the temporal consistency of synthetic video, we introduce a latent variable propagation module, which views the latent features of previous frame as noise prior and injects it into the latent features of current frame. In addition, we design a streaming data sampler to orderly sample the original image in a video clip at continuous iterations. Given the reference frame, our Glad can be viewed as a streaming simulator by generating the videos for specific scenes. Extensive experiments are performed on the widely-used nuScenes dataset. Experimental results demonstrate that our proposed Glad achieves promising performance, serving as a strong baseline for online video generation. We will release the source code and models publicly.
Problem

Research questions and friction points this paper is trying to address.

Generates diverse real-world driving scenes for autonomous vehicles.
Ensures temporal consistency in synthetic video generation.
Addresses limitations in scene adaptability and video length.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frame-by-frame video generation for autonomous driving
Latent variable propagation ensures temporal consistency
Streaming data sampler for continuous video simulation
🔎 Similar Papers
No similar papers found.