🤖 AI Summary
This work addresses the challenge of efficiently reconstructing 3D surfaces from sparse multi-view images, a setting where existing 3D Gaussian splatting methods—reliant on dense viewpoints and time-consuming per-scene optimization—struggle to perform effectively. The authors propose an end-to-end feedforward framework that directly regresses pixel-aligned Gaussian surfels from sparse input views. Key innovations include spatial low-pass filtering guided by the Nyquist sampling theorem to determine appropriate sampling density, and a cross-view feature projection and fusion mechanism that enables high-fidelity geometry recovery without per-scene fine-tuning. Evaluated on the DTU benchmark, the method achieves reconstruction accuracy on par with state-of-the-art approaches while requiring only one second for inference—yielding a speedup of approximately two orders of magnitude.
📝 Abstract
3D Gaussian Splatting (3DGS) has demonstrated impressive performance in 3D scene reconstruction. Beyond novel view synthesis, it shows great potential for multi-view surface reconstruction. Existing methods employ optimization-based reconstruction pipelines that achieve precise and complete surface extractions. However, these approaches typically require dense input views and high time consumption for per-scene optimization. To address these limitations, we propose SurfelSplat, a feed-forward framework that generates efficient and generalizable pixel-aligned Gaussian surfel representations from sparse-view images. We observe that conventional feed-forward structures struggle to recover accurate geometric attributes of Gaussian surfels because the spatial frequency of pixel-aligned primitives exceeds Nyquist sampling rates. Therefore, we propose a cross-view feature aggregation module based on the Nyquist sampling theorem. Specifically, we first adapt the geometric forms of Gaussian surfels with spatial sampling rate-guided low-pass filters. We then project the filtered surfels across all input views to obtain cross-view feature correlations. By processing these correlations through a specially designed feature fusion network, we can finally regress Gaussian surfels with precise geometry. Extensive experiments on DTU reconstruction benchmarks demonstrate that our model achieves comparable results with state-of-the-art methods, and predict Gaussian surfels within 1 second, offering a 100x speedup without costly per-scene training.