Gaussian-Voxel Duet: A Dual-Scaffolding Hybrid Representation for Fast and Accurate Monocular Surface Reconstruction

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular surface reconstruction methods struggle to balance geometric accuracy and optimization efficiency: 3D Gaussian splatting often suffers from geometric distortions due to redundant primitives, while neural signed distance fields (SDFs) improve fidelity at the cost of high computational expense. This work proposes a hybrid Gaussian–voxel dual-scaffold representation that anchors 3D Gaussians to a sparse voxel-based SDF scaffold and introduces an implicit surface-pulling loss to tightly align the Gaussians with the SDF-defined iso-surface. Evaluated on real-world indoor datasets—including ScanNet++, ScanNetv2, and DeepBlending—the method achieves state-of-the-art surface reconstruction quality and leading novel-view synthesis performance, while maintaining fast convergence and real-time rendering capabilities, effectively reconciling geometric fidelity with training efficiency.
📝 Abstract
While 3D Gaussian Splatting has achieved remarkable success in photorealistic novel view synthesis, its pursuit of fast and high-fidelity 3D reconstruction has long been constrained by a trade-off between geometric accuracy and optimization efficiency. Methods specialized in image rendering converge quickly at the cost of imperfect geometry caused by superfluous primitives overfitting training views, while methods integrating neural signed-distance field (SDF) for better geometry incur prohibitive training costs. In this paper, we attempt to strike a better trade-off by tethering scaffold-anchored Gaussians to a jointly optimized sparse voxel scaffold. This hybrid Gaussian-Voxel representation explicitly confines anchored Gaussians to a narrow band around surfaces defined by voxelized SDFs, which effectively improves representation efficiency and condenses floating Gaussians without sacrificing geometry quality. An implicit surface tethering loss further pulls individual Gaussian primitives closer to SDF-induced surfaces in a mutually regularized manner for improved reconstruction accuracy. Extensive experiments on diverse real-world indoor scenes from ScanNet++, ScanNetv2, and DeepBlending datasets demonstrate that our method achieves state-of-the-art surface reconstruction quality as well as superior novel view synthesis against leading baselines, while maintaining fast training convergence and real-time rendering. Code will be available at https://github.com/duzh11/VoxelGS.
Problem

Research questions and friction points this paper is trying to address.

monocular surface reconstruction
3D Gaussian Splatting
geometric accuracy
optimization efficiency
neural signed-distance field
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
Signed Distance Field (SDF)
Hybrid Representation
Monocular Surface Reconstruction
Sparse Voxel Scaffold
🔎 Similar Papers
No similar papers found.