🤖 AI Summary
This paper addresses key challenges in monocular video-based online 3D reconstruction—including absence of depth supervision, inaccurate Gaussian distribution modeling, and local-global inconsistency—by proposing a real-time, RGB-only Gaussian mapping method. The approach introduces three core contributions: (1) a hierarchical Gaussian management module that enables dynamic, scale-adaptive 3D Gaussian ellipsoid placement; (2) a compact spatial representation based on multi-level occupancy hash voxels (MOHV); and (3) a global consistency optimization framework jointly enforcing photometric and geometric constraints. Crucially, the method requires neither depth maps nor pre-trained models and seamlessly integrates with standard visual odometry pipelines. It achieves real-time performance (>20 FPS) while significantly improving geometric accuracy and texture fidelity. Extensive evaluation demonstrates state-of-the-art results across multiple benchmarks, outperforming existing online RGB and RGB-D reconstruction methods.
📝 Abstract
We propose an online 3D Gaussian-based dense mapping framework for photorealistic details reconstruction from a monocular image stream. Our approach addresses two key challenges in monocular online reconstruction: distributing Gaussians without relying on depth maps and ensuring both local and global consistency in the reconstructed maps. To achieve this, we introduce two key modules: the Hierarchical Gaussian Management Module for effective Gaussian distribution and the Global Consistency Optimization Module for maintaining alignment and coherence at all scales. In addition, we present the Multi-level Occupancy Hash Voxels (MOHV), a structure that regularizes Gaussians for capturing details across multiple levels of granularity. MOHV ensures accurate reconstruction of both fine and coarse geometries and textures, preserving intricate details while maintaining overall structural integrity. Compared to state-of-the-art RGB-only and even RGB-D methods, our framework achieves superior reconstruction quality with high computational efficiency. Moreover, it integrates seamlessly with various tracking systems, ensuring generality and scalability.