🤖 AI Summary
Existing autonomous driving simulation methods struggle to balance photorealism and controllability in scene generation. This work proposes a Gaussian-grid hybrid representation to construct a unified, editable 3D scene framework that supports language-driven vehicle insertion and fine-grained trajectory manipulation. To ensure spatiotemporal consistency, the method incorporates noise-aware video diffusion rendering and operates under a single-pass feedforward editing paradigm—eliminating the need for per-trajectory optimization. To the best of our knowledge, this is the first approach to achieve high-fidelity, arbitrarily controllable driving scene editing. Experiments demonstrate an 83.4% improvement in user preference and a 25.19-point reduction in FID over the next-best method. The authors also introduce HorizonSuite, a comprehensive benchmark for evaluating controllable driving simulation.
📝 Abstract
Controllable driving scene generation is critical for realistic and scalable autonomous driving simulation, yet existing approaches struggle to jointly achieve photorealism and precise control. We introduce HorizonForge, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion. Edits are rendered through a noise-aware video diffusion process that enforces spatial and temporal consistency, producing diverse scene variations in a single feed-forward pass without per-trajectory optimization. To standardize evaluation, we further propose HorizonSuite, a comprehensive benchmark spanning ego- and agent-level editing tasks such as trajectory modifications and object manipulation. Extensive experiments show that Gaussian-Mesh representation delivers substantially higher fidelity than alternative 3D representations, and that temporal priors from video diffusion are essential for coherent synthesis. Combining these findings, HorizonForge establishes a simple yet powerful paradigm for photorealistic, controllable driving simulation, achieving an 83.4% user-preference gain and a 25.19% FID improvement over the second best state-of-the-art method. Project page: https://horizonforge.github.io/ .