🤖 AI Summary
Existing 3D Gaussian-based RGB-D SLAM methods suffer from excessive Gaussian primitives and deep optimization iterations, resulting in poor real-time performance (<20 fps)—significantly lagging behind geometric approaches like KinectFusion. To address this, we propose a Gaussian-SDF hybrid representation: a lightweight colored signed distance function (SDF) models the global geometry, while sparse 3D Gaussians are adaptively deployed only in geometrically complex regions to refine appearance. This design drastically reduces the number of Gaussian parameters and associated optimization overhead, achieving a favorable trade-off between geometric consistency and texture fidelity. Evaluated on real-world scenes, our method achieves >150 fps reconstruction speed—over 10× faster than state-of-the-art Gaussian SLAM—while maintaining comparable geometric accuracy and visual quality. To the best of our knowledge, this is the first Gaussian-based framework to achieve practical real-time 3D reconstruction.
📝 Abstract
While recent Gaussian-based SLAM methods achieve photorealistic reconstruction from RGB-D data, their computational performance remains a critical bottleneck. State-of-the-art techniques operate at less than 20 fps, significantly lagging behind geometry-centric approaches like KinectFusion (hundreds of fps). This limitation stems from the heavy computational burden: modeling scenes requires numerous Gaussians and complex iterative optimization to fit RGB-D data, where insufficient Gaussian counts or optimization iterations cause severe quality degradation. To address this, we propose a Gaussian-SDF hybrid representation, combining a colorized Signed Distance Field (SDF) for smooth geometry and appearance with 3D Gaussians to capture underrepresented details. The SDF is efficiently constructed via RGB-D fusion (as in geometry-centric methods), while Gaussians undergo iterative optimization. Our representation enables drastic Gaussian reduction (50% fewer) by avoiding full-scene Gaussian modeling, and efficient Gaussian optimization (75% fewer iterations) through targeted appearance refinement. Building upon this representation, we develop GPS-SLAM (Gaussian-Plus-SDF SLAM), a real-time 3D reconstruction system achieving over 150 fps on real-world Azure Kinect sequences -- delivering an order-of-magnitude speedup over state-of-the-art techniques while maintaining comparable reconstruction quality. We will release the source code and data to facilitate future research.