Neural-MMGS: Multi-modal Neural Gaussian Splats for Large-Scale Scene Reconstruction

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Existing large-scale scene reconstruction methods inadequately exploit LiDAR’s physical and semantic attributes, and treat multimodal inputs as separate parameters embedded into Gaussian splatting, resulting in excessive memory consumption and weak cross-modal interaction. To address this, we propose a compact, learnable embedding mechanism that implicitly encodes image appearance, LiDAR geometry, and semantics into a unified Gaussian distribution—reducing GPU memory footprint while strengthening inter-modal synergy. Built upon the neural Gaussian splatting framework, our method jointly optimizes optical, physical, and semantic features, and incorporates a lightweight neural decoder for end-to-end reconstruction. Evaluated on Oxford Spires, our approach achieves higher-fidelity 3D reconstructions; on KITTI-360, it attains competitive novel-view synthesis performance with 37% less storage than baseline methods.

Technology Category

Application Category

📝 Abstract

This paper proposes Neural-MMGS, a novel neural 3DGS framework for multimodal large-scale scene reconstruction that fuses multiple sensing modalities in a per-gaussian compact, learnable embedding. While recent works focusing on large-scale scene reconstruction have incorporated LiDAR data to provide more accurate geometric constraints, we argue that LiDAR's rich physical properties remain underexplored. Similarly, semantic information has been used for object retrieval, but could provide valuable high-level context for scene reconstruction. Traditional approaches append these properties to Gaussians as separate parameters, increasing memory usage and limiting information exchange across modalities. Instead, our approach fuses all modalities -- image, LiDAR, and semantics -- into a compact, learnable embedding that implicitly encodes optical, physical, and semantic features in each Gaussian. We then train lightweight neural decoders to map these embeddings to Gaussian parameters, enabling the reconstruction of each sensing modality with lower memory overhead and improved scalability. We evaluate Neural-MMGS on the Oxford Spires and KITTI-360 datasets. On Oxford Spires, we achieve higher-quality reconstructions, while on KITTI-360, our method reaches competitive results with less storage consumption compared with current approaches in LiDAR-based novel-view synthesis.

Problem

Research questions and friction points this paper is trying to address.

Fusing multimodal sensor data for large-scale 3D scene reconstruction

Reducing memory usage by encoding features in compact embeddings

Improving scalability while maintaining competitive reconstruction quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses multimodal data into compact Gaussian embeddings

Uses neural decoders to map embeddings to parameters

Achieves lower memory usage and improved scalability

🔎 Similar Papers

No similar papers found.