🤖 AI Summary
This work addresses the challenges of sparse LiDAR point clouds, geometric drift, and fixed fusion parameters in large-scale indoor scenes—issues that commonly lead to mesh holes, over-smoothing, and boundary artifacts. To overcome these limitations, the authors propose a modular, incremental RGB-LiDAR fusion framework that, for the first time, integrates per-frame semantic labels generated by vision foundation models into truncated signed distance function (TSDF) voxels in an incremental manner, enabling semantic-aware high-fidelity mesh reconstruction. The system cohesively combines visual semantic labeling, LiDAR-inertial odometry mapping, semantic-aware TSDF fusion, and Marching Cubes surface extraction. Evaluated on the Oxford Spires dataset, the method achieves superior geometric reconstruction accuracy compared to state-of-the-art approaches such as ImMesh and Voxblox, and produces semantic meshes directly suitable for Universal Scene Description (USD) asset creation and extended reality (XR) applications.
📝 Abstract
Geometric high-fidelity mesh reconstruction from LiDAR-inertial scans remains challenging in large, complex indoor environments -- such as cultural buildings -- where point cloud sparsity, geometric drift, and fixed fusion parameters produce holes, over-smoothing, and spurious surfaces at structural boundaries. We propose a modular, incremental RGB+LiDAR pipeline that generates incremental semantics-aided high-quality meshes from indoor scans through scan frame-based direct label transfer. A vision foundation model labels each incoming RGB frame; labels are incrementally projected and fused onto a LiDAR-inertial odometry map; and an incremental semantics-aware Truncated Signed Distance Function (TSDF) fusion step produces the final mesh via marching cubes. This frame-level fusion strategy preserves the geometric fidelity of LiDAR while leveraging rich visual semantics to resolve geometric ambiguities at reconstruction boundaries caused by LiDAR point-cloud sparsity and geometric drift. We demonstrate that semantic guidance improves geometric reconstruction quality; quantitative evaluation is therefore performed using geometric metrics on the Oxford Spires dataset, while results from the NTU VIRAL dataset are analyzed qualitatively. The proposed method outperforms state-of-the-art geometric baselines ImMesh and Voxblox, demonstrating the benefit of semantics-aided fusion for geometric mesh quality. The resulting semantically labelled meshes are of value when reconstructing Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.