🤖 AI Summary
To address limitations of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS)—including slow training/rendering, poor support for heterogeneous sensors (e.g., cameras and rotating LiDAR), and tight coupling between scene representation and rendering—this paper proposes Sparse Local Fields (SaLF), a novel voxelized implicit field that explicitly decouples representation from rendering. SaLF employs sparse 3D voxels as primitives to construct local implicit fields and is the first method to natively support both perspective cameras and rotating LiDAR within a unified framework. It integrates adaptive pruning/densification, GPU-accelerated rasterization, and differentiable ray tracing. Our implementation achieves training in under 30 minutes, real-time rendering at ≥50 FPS for cameras and ≥600 FPS for LiDAR, while matching state-of-the-art reconstruction quality. SaLF significantly improves efficiency, generalizability, and scalability for large-scale scene modeling and real-time simulation.
📝 Abstract
High-fidelity sensor simulation of light-based sensors such as cameras and LiDARs is critical for safe and accurate autonomy testing. Neural radiance field (NeRF)-based methods that reconstruct sensor observations via ray-casting of implicit representations have demonstrated accurate simulation of driving scenes, but are slow to train and render, hampering scale. 3D Gaussian Splatting (3DGS) has demonstrated faster training and rendering times through rasterization, but is primarily restricted to pinhole camera sensors, preventing usage for realistic multi-sensor autonomy evaluation. Moreover, both NeRF and 3DGS couple the representation with the rendering procedure (implicit networks for ray-based evaluation, particles for rasterization), preventing interoperability, which is key for general usage. In this work, we present Sparse Local Fields (SaLF), a novel volumetric representation that supports rasterization and raytracing. SaLF represents volumes as a sparse set of 3D voxel primitives, where each voxel is a local implicit field. SaLF has fast training (<30 min) and rendering capabilities (50+ FPS for camera and 600+ FPS LiDAR), has adaptive pruning and densification to easily handle large scenes, and can support non-pinhole cameras and spinning LiDARs. We demonstrate that SaLF has similar realism as existing self-driving sensor simulation methods while improving efficiency and enhancing capabilities, enabling more scalable simulation. https://waabi.ai/salf/