🤖 AI Summary
To address NeRF reconstruction failure under non-ideal conditions—such as pose noise, motion blur, and large-scale scene variations—in event-camera data, this paper proposes the first robust event-driven neural radiance field method. Our approach extends the e-NeRF framework and jointly trains on synthetic and real-world event data. Key contributions include: (1) a pose-correction joint learning mechanism enabling differentiable optimization under pose uncertainty; (2) a hierarchical event distillation architecture (proposal + vanilla two-stage) integrated with density-aware event modeling; and (3) synergistic constraints from event reconstruction loss and temporal consistency loss. Evaluated on a newly constructed large-scale non-ideal benchmark, our method achieves state-of-the-art performance, significantly improving view consistency and geometric accuracy. Notably, it is the first to enable generalizable reconstruction across scales—from object-level to city-scale scenes—demonstrating unprecedented robustness to real-world event-camera imperfections.
📝 Abstract
Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural Radiance Fields combined with the unique benefits of event cameras, has spurred recent research into reconstructing NeRF from data captured by moving event cameras. While showing impressive performance, existing methods rely on ideal conditions with the availability of uniform and high-quality event sequences and accurate camera poses, and mainly focus on the object level reconstruction, thus limiting their practical applications. In this work, we propose AE-NeRF to address the challenges of learning event-based NeRF from non-ideal conditions, including non-uniform event sequences, noisy poses, and various scales of scenes. Our method exploits the density of event streams and jointly learn a pose correction module with an event-based NeRF (e-NeRF) framework for robust 3D reconstruction from inaccurate camera poses. To generalize to larger scenes, we propose hierarchical event distillation with a proposal e-NeRF network and a vanilla e-NeRF network to resample and refine the reconstruction process. We further propose an event reconstruction loss and a temporal loss to improve the view consistency of the reconstructed scene. We established a comprehensive benchmark that includes large-scale scenes to simulate practical non-ideal conditions, incorporating both synthetic and challenging real-world event datasets. The experimental results show that our method achieves a new state-of-the-art in event-based 3D reconstruction.