🤖 AI Summary
Traditional spiking-camera-based 3D reconstruction relies on a cascaded pipeline (spikes → images → poses → 3D), where image reconstruction errors propagate and accumulate across stages, severely degrading geometric and textural fidelity. This paper proposes the first end-to-end jointly optimized framework that unifies spike-stream image reconstruction, camera pose self-calibration, and 3D Gaussian rasterization rendering. Leveraging multi-view photometric consistency constraints and a spike-motion sensitivity prior, the three modules are co-optimized iteratively. Key technical components include a differentiable spike-to-image network, a differentiable 3D Gaussian splatting renderer, a joint photometric-geometric loss function, and a pose self-optimization module. Evaluated on both synthetic and real-world data, the method significantly suppresses error propagation: it achieves robust reconstruction even under inaccurate initial pose estimates, and consistently outperforms state-of-the-art approaches in both geometric accuracy and texture fidelity.
📝 Abstract
Spike cameras, as an innovative neuromorphic camera that captures scenes with the 0-1 bit stream at 40 kHz, are increasingly employed for the 3D reconstruction task via Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS). Previous spike-based 3D reconstruction approaches often employ a casecased pipeline: starting with high-quality image reconstruction from spike streams based on established spike-to-image reconstruction algorithms, then progressing to camera pose estimation and 3D reconstruction. However, this cascaded approach suffers from substantial cumulative errors, where quality limitations of initial image reconstructions negatively impact pose estimation, ultimately degrading the fidelity of the 3D reconstruction. To address these issues, we propose a synergistic optimization framework, extbf{USP-Gaussian}, that unifies spike-based image reconstruction, pose correction, and Gaussian splatting into an end-to-end framework. Leveraging the multi-view consistency afforded by 3DGS and the motion capture capability of the spike camera, our framework enables a joint iterative optimization that seamlessly integrates information between the spike-to-image network and 3DGS. Experiments on synthetic datasets with accurate poses demonstrate that our method surpasses previous approaches by effectively eliminating cascading errors. Moreover, we integrate pose optimization to achieve robust 3D reconstruction in real-world scenarios with inaccurate initial poses, outperforming alternative methods by effectively reducing noise and preserving fine texture details. Our code, data and trained models will be available at url{https://github.com/chenkang455/USP-Gaussian}.