Decoupled Diffusion Sparks Adaptive Scene Generation

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing traffic-scene generation methods face two key bottlenecks: full-sequence denoising compromises online responsiveness, while frame-wise prediction lacks explicit object-state guidance; moreover, open datasets predominantly cover routine behaviors, hindering realistic generation of high-risk corner cases. This paper proposes Nexus, a decoupled diffusion framework that introduces partial noise masking training and noise-aware scheduling—enabling disentangled modeling of sequential coherence and scenario challenge during layout generation. We further design fine-grained tokenized diffusion, independent noise-state modeling for dynamic agents, and closed-loop planning co-optimization. To support rigorous evaluation, we construct the first 540-hour high-risk corner-case simulation dataset. Experiments demonstrate a 40% reduction in displacement error and a 20% improvement in closed-loop planning performance, significantly enhancing realism and safety in complex interactive scenarios—including aggressive cut-ins, emergency braking, and collision avoidance.

Technology Category

Application Category

📝 Abstract

Controllable scene generation could reduce the cost of diverse data collection substantially for autonomous driving. Prior works formulate the traffic layout generation as predictive progress, either by denoising entire sequences at once or by iteratively predicting the next frame. However, full sequence denoising hinders online reaction, while the latter's short-sighted next-frame prediction lacks precise goal-state guidance. Further, the learned model struggles to generate complex or challenging scenarios due to a large number of safe and ordinal driving behaviors from open datasets. To overcome these, we introduce Nexus, a decoupled scene generation framework that improves reactivity and goal conditioning by simulating both ordinal and challenging scenarios from fine-grained tokens with independent noise states. At the core of the decoupled pipeline is the integration of a partial noise-masking training strategy and a noise-aware schedule that ensures timely environmental updates throughout the denoising process. To complement challenging scenario generation, we collect a dataset consisting of complex corner cases. It covers 540 hours of simulated data, including high-risk interactions such as cut-in, sudden braking, and collision. Nexus achieves superior generation realism while preserving reactivity and goal orientation, with a 40% reduction in displacement error. We further demonstrate that Nexus improves closed-loop planning by 20% through data augmentation and showcase its capability in safety-critical data generation.

Problem

Research questions and friction points this paper is trying to address.

Improving reactivity and goal conditioning in scene generation

Generating complex or challenging driving scenarios effectively

Reducing displacement error and enhancing closed-loop planning performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled scene generation with independent noise states

Partial noise-masking training strategy integration

Noise-aware schedule for timely environmental updates

🔎 Similar Papers

LT3SD: Latent Trees for 3D Scene Diffusion