🤖 AI Summary
Addressing the scarcity, high acquisition cost, and safety risks associated with extreme corner cases in autonomous driving, this paper proposes G²Editor—a novel 3D-aware video editing framework. It pioneers the use of 3D Gaussian splatting as a dense geometric prior, jointly enabling scene-level 3D bounding box layout reconstruction and occlusion-aware inpainting to achieve precise relocalization, insertion, and removal of target vehicles. A hierarchical, fine-grained feature-guided diffusion denoising mechanism is introduced to significantly enhance visual realism and spatial consistency of edited videos. Experiments on the Waymo Open Dataset demonstrate that G²Editor outperforms state-of-the-art image- and 3D-based editing methods in both pose control accuracy and appearance fidelity. Moreover, it substantially improves downstream perception and planning performance, validating its practical utility for autonomous driving simulation and robustness evaluation.
📝 Abstract
Corner cases are crucial for training and validating autonomous driving systems, yet collecting them from the real world is often costly and hazardous. Editing objects within captured sensor data offers an effective alternative for generating diverse scenarios, commonly achieved through 3D Gaussian Splatting or image generative models. However, these approaches often suffer from limited visual fidelity or imprecise pose control. To address these issues, we propose G^2Editor, a framework designed for photorealistic and precise object editing in driving videos. Our method leverages a 3D Gaussian representation of the edited object as a dense prior, injected into the denoising process to ensure accurate pose control and spatial consistency. A scene-level 3D bounding box layout is employed to reconstruct occluded areas of non-target objects. Furthermore, to guide the appearance details of the edited object, we incorporate hierarchical fine-grained features as additional conditions during generation. Experiments on the Waymo Open Dataset demonstrate that G^2Editor effectively supports object repositioning, insertion, and deletion within a unified framework, outperforming existing methods in both pose controllability and visual quality, while also benefiting downstream data-driven tasks.