GenAssets: Generating in-the-wild 3D Assets in Latent Space

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing methods struggle to efficiently generate high-quality, complete 3D assets suitable for multi-view simulation from sparse and heavily occluded real-world driving scenes. This work proposes a “reconstruct-then-generate” 3D latent diffusion model that first constructs a high-fidelity object latent space by integrating LiDAR and camera data through occlusion-aware neural rendering trained jointly across multiple scenes. A 3D latent diffusion model is then trained in this space to synthesize diverse assets with complete geometry and appearance. By uniquely combining occlusion-aware neural reconstruction with 3D latent diffusion generation, the method significantly outperforms existing approaches on real-world driving data, achieving superior completeness, diversity, and simulation readiness, thereby enabling large-scale autonomous driving simulation applications.

Technology Category

Application Category

📝 Abstract

High-quality 3D assets for traffic participants are critical for multi-sensor simulation, which is essential for the safe end-to-end development of autonomy. Building assets from in-the-wild data is key for diversity and realism, but existing neural-rendering based reconstruction methods are slow and generate assets that render well only from viewpoints close to the original observations, limiting their usefulness in simulation. Recent diffusion-based generative models build complete and diverse assets, but perform poorly on in-the-wild driving scenes, where observed actors are captured under sparse and limited fields of view, and are partially occluded. In this work, we propose a 3D latent diffusion model that learns on in-the-wild LiDAR and camera data captured by a sensor platform and generates high-quality 3D assets with complete geometry and appearance. Key to our method is a "reconstruct-then-generate" approach that first leverages occlusion-aware neural rendering trained over multiple scenes to build a high-quality latent space for objects, and then trains a diffusion model that operates on the latent space. We show our method outperforms existing reconstruction and generation based methods, unlocking diverse and scalable content creation for simulation.

Problem

Research questions and friction points this paper is trying to address.

3D asset generation

in-the-wild data

neural rendering

occlusion

simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D latent diffusion

occlusion-aware neural rendering

in-the-wild 3D asset generation