Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Addressing the scarcity, high acquisition cost, and safety risks associated with extreme corner cases in autonomous driving, this paper proposes G²Editor—a novel 3D-aware video editing framework. It pioneers the use of 3D Gaussian splatting as a dense geometric prior, jointly enabling scene-level 3D bounding box layout reconstruction and occlusion-aware inpainting to achieve precise relocalization, insertion, and removal of target vehicles. A hierarchical, fine-grained feature-guided diffusion denoising mechanism is introduced to significantly enhance visual realism and spatial consistency of edited videos. Experiments on the Waymo Open Dataset demonstrate that G²Editor outperforms state-of-the-art image- and 3D-based editing methods in both pose control accuracy and appearance fidelity. Moreover, it substantially improves downstream perception and planning performance, validating its practical utility for autonomous driving simulation and robustness evaluation.

Technology Category

Application Category

📝 Abstract

Corner cases are crucial for training and validating autonomous driving systems, yet collecting them from the real world is often costly and hazardous. Editing objects within captured sensor data offers an effective alternative for generating diverse scenarios, commonly achieved through 3D Gaussian Splatting or image generative models. However, these approaches often suffer from limited visual fidelity or imprecise pose control. To address these issues, we propose G^2Editor, a framework designed for photorealistic and precise object editing in driving videos. Our method leverages a 3D Gaussian representation of the edited object as a dense prior, injected into the denoising process to ensure accurate pose control and spatial consistency. A scene-level 3D bounding box layout is employed to reconstruct occluded areas of non-target objects. Furthermore, to guide the appearance details of the edited object, we incorporate hierarchical fine-grained features as additional conditions during generation. Experiments on the Waymo Open Dataset demonstrate that G^2Editor effectively supports object repositioning, insertion, and deletion within a unified framework, outperforming existing methods in both pose controllability and visual quality, while also benefiting downstream data-driven tasks.

Problem

Research questions and friction points this paper is trying to address.

Generates realistic driving videos with precise object editing

Addresses limited visual fidelity in 3D Gaussian editing methods

Solves imprecise pose control in driving scenario generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian representation for pose control

Scene-level 3D bounding box for occlusion handling

Hierarchical fine-grained features for appearance guidance

🔎 Similar Papers

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

2024-09-06arXiv.orgCitations: 1

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

2024-05-23arXiv.orgCitations: 36

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

2024-06-04arXiv.orgCitations: 1

Authors to Follow