Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

📅 2024-10-16

🏛️ arXiv.org

📈 Citations: 30

✨ Influential: 1

career value

304K/year

🤖 AI Summary

To address the challenges of long-sequence modeling and slow inference in 360° high-resolution 3D Gaussian reconstruction for large-scale scenes, this paper proposes the first end-to-end feed-forward reconstruction framework integrating Mamba2 and Transformer architectures. Our method enhances sequence processing capacity via dynamic token merging and adaptive Gaussian density pruning, and achieves single-pass reconstruction by coupling with differentiable 3D Gaussian rendering. On 32-frame 960×540 spherical inputs, it completes full-scene reconstruction in just 1.3 seconds on an A100 GPU—two orders of magnitude faster than state-of-the-art optimization-based methods. Quantitative and qualitative evaluations on DL3DV-140 and Tanks and Temples demonstrate reconstruction quality on par with SOTA optimization approaches. To our knowledge, this is the first work to enable high-fidelity, real-time, large-scale 3D Gaussian splatting reconstruction.

Technology Category

Application Category

📝 Abstract

We propose Long-LRM, a generalizable 3D Gaussian reconstruction model that is capable of reconstructing a large scene from a long sequence of input images. Specifically, our model can process 32 source images at 960x540 resolution within only 1.3 seconds on a single A100 80G GPU. Our architecture features a mixture of the recent Mamba2 blocks and the classical transformer blocks which allowed many more tokens to be processed than prior work, enhanced by efficient token merging and Gaussian pruning steps that balance between quality and efficiency. Unlike previous feed-forward models that are limited to processing 1~4 input images and can only reconstruct a small portion of a large scene, Long-LRM reconstructs the entire scene in a single feed-forward step. On large-scale scene datasets such as DL3DV-140 and Tanks and Temples, our method achieves performance comparable to optimization-based approaches while being two orders of magnitude more efficient. Project page: https://arthurhero.github.io/projects/llrm

Problem

Research questions and friction points this paper is trying to address.

Instant high-resolution 360° 3D Gaussian reconstruction

Handling long sequences of 250K tokens efficiently

Achieving 800x speedup over optimization-based methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba2 and transformer blocks mixture

Light-weight token merging module

Gaussian pruning for quality efficiency

🔎 Similar Papers

No similar papers found.