🤖 AI Summary
Existing large-scale scene reconstruction methods inadequately exploit LiDAR’s physical and semantic attributes, and treat multimodal inputs as separate parameters embedded into Gaussian splatting, resulting in excessive memory consumption and weak cross-modal interaction. To address this, we propose a compact, learnable embedding mechanism that implicitly encodes image appearance, LiDAR geometry, and semantics into a unified Gaussian distribution—reducing GPU memory footprint while strengthening inter-modal synergy. Built upon the neural Gaussian splatting framework, our method jointly optimizes optical, physical, and semantic features, and incorporates a lightweight neural decoder for end-to-end reconstruction. Evaluated on Oxford Spires, our approach achieves higher-fidelity 3D reconstructions; on KITTI-360, it attains competitive novel-view synthesis performance with 37% less storage than baseline methods.
📝 Abstract
This paper proposes Neural-MMGS, a novel neural 3DGS framework for multimodal large-scale scene reconstruction that fuses multiple sensing modalities in a per-gaussian compact, learnable embedding. While recent works focusing on large-scale scene reconstruction have incorporated LiDAR data to provide more accurate geometric constraints, we argue that LiDAR's rich physical properties remain underexplored. Similarly, semantic information has been used for object retrieval, but could provide valuable high-level context for scene reconstruction. Traditional approaches append these properties to Gaussians as separate parameters, increasing memory usage and limiting information exchange across modalities. Instead, our approach fuses all modalities -- image, LiDAR, and semantics -- into a compact, learnable embedding that implicitly encodes optical, physical, and semantic features in each Gaussian. We then train lightweight neural decoders to map these embeddings to Gaussian parameters, enabling the reconstruction of each sensing modality with lower memory overhead and improved scalability. We evaluate Neural-MMGS on the Oxford Spires and KITTI-360 datasets. On Oxford Spires, we achieve higher-quality reconstructions, while on KITTI-360, our method reaches competitive results with less storage consumption compared with current approaches in LiDAR-based novel-view synthesis.