HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Existing autonomous driving simulation methods struggle to balance photorealism and controllability in scene generation. This work proposes a Gaussian-grid hybrid representation to construct a unified, editable 3D scene framework that supports language-driven vehicle insertion and fine-grained trajectory manipulation. To ensure spatiotemporal consistency, the method incorporates noise-aware video diffusion rendering and operates under a single-pass feedforward editing paradigm—eliminating the need for per-trajectory optimization. To the best of our knowledge, this is the first approach to achieve high-fidelity, arbitrarily controllable driving scene editing. Experiments demonstrate an 83.4% improvement in user preference and a 25.19-point reduction in FID over the next-best method. The authors also introduce HorizonSuite, a comprehensive benchmark for evaluating controllable driving simulation.

Technology Category

Application Category

📝 Abstract

Controllable driving scene generation is critical for realistic and scalable autonomous driving simulation, yet existing approaches struggle to jointly achieve photorealism and precise control. We introduce HorizonForge, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion. Edits are rendered through a noise-aware video diffusion process that enforces spatial and temporal consistency, producing diverse scene variations in a single feed-forward pass without per-trajectory optimization. To standardize evaluation, we further propose HorizonSuite, a comprehensive benchmark spanning ego- and agent-level editing tasks such as trajectory modifications and object manipulation. Extensive experiments show that Gaussian-Mesh representation delivers substantially higher fidelity than alternative 3D representations, and that temporal priors from video diffusion are essential for coherent synthesis. Combining these findings, HorizonForge establishes a simple yet powerful paradigm for photorealistic, controllable driving simulation, achieving an 83.4% user-preference gain and a 25.19% FID improvement over the second best state-of-the-art method. Project page: https://horizonforge.github.io/ .

Problem

Research questions and friction points this paper is trying to address.

driving scene generation

photorealism

precise control

3D scene editing

autonomous driving simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting

Video Diffusion

Controllable Scene Generation