Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing neural scene reconstruction methods struggle to generate complete 3D assets from autonomous driving logs that are suitable for wide-baseline view synthesis and agent interaction. This work proposes an end-to-end image-to-3D asset generation system that constructs a large-scale object-centric dataset through geometry-aware preprocessing, heterogeneous sensor alignment, and hybrid data augmentation. It introduces SparseViewDiT, a multi-view diffusion model capable of generating coherent 3D structures from sparse and viewpoint-constrained real-world observations. The generated geometry is further refined via 3D Gaussian Splatting combined with a self-distillation strategy to enhance fine-grained details. The resulting framework efficiently and scalably produces high-fidelity, simulation-ready 3D assets, significantly improving novel view synthesis quality and enabling more effective agent interaction in simulated environments.

Technology Category

Application Category

📝 Abstract

Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.

Problem

Research questions and friction points this paper is trying to address.

3D asset extraction

autonomous driving logs

closed-loop simulation

sparse-view reconstruction

neural scene reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asset Harvester

Sparse-view 3D reconstruction

3D Gaussian Splatting