🤖 AI Summary
Autonomous driving (AD) simulation urgently requires realistic, controllable, and reusable dynamic 3D assets; however, existing neural reconstruction methods—such as 3D Gaussian Splatting (3DGS)—are constrained by per-scene optimization, yielding incomplete object geometry, illumination entanglement, limited editability, and poor cross-scene generalizability. To address this, we propose R3D2, a lightweight single-step diffusion model that pioneers a joint neural rendering and diffusion generation paradigm. R3D2 enables real-time synthesis of complete, physically grounded 3D assets with illumination-consistent shading and cast shadows, supporting text-driven insertion. We introduce the first 3DGS-based asset synthesis dataset tailored for AD simulation and demonstrate cross-scene asset reuse. Quantitative evaluation and user studies confirm R3D2’s superiority over baselines in visual fidelity and physical consistency, establishing a new scalable, safety-critical paradigm for AD validation.
📝 Abstract
Validating autonomous driving (AD) systems requires diverse and safety-critical testing, making photorealistic virtual environments essential. Traditional simulation platforms, while controllable, are resource-intensive to scale and often suffer from a domain gap with real-world data. In contrast, neural reconstruction methods like 3D Gaussian Splatting (3DGS) offer a scalable solution for creating photorealistic digital twins of real-world driving scenes. However, they struggle with dynamic object manipulation and reusability as their per-scene optimization-based methodology tends to result in incomplete object models with integrated illumination effects. This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome these limitations and enable realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time. This is achieved by training R3D2 on a novel dataset: 3DGS object assets are generated from in-the-wild AD data using an image-conditioned 3D generative model, and then synthetically placed into neural rendering-based virtual environments, allowing R3D2 to learn realistic integration. Quantitative and qualitative evaluations demonstrate that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer, allowing for true scalability in AD validation. To promote further research in scalable and realistic AD simulation, we will release our dataset and code, see https://research.zenseact.com/publications/R3D2/.