LONG3R: Long Sequence Streaming 3D Reconstruction

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing streaming multi-view reconstruction methods struggle to balance long-sequence processing with real-time performance: offline optimization is computationally prohibitive, while lightweight approaches are inherently limited to short sequences. This paper introduces the first real-time 3D reconstruction framework capable of handling arbitrarily long streaming multi-view inputs. Our core contributions are threefold: (1) a 3D spatiotemporal memory mechanism that enables efficient state updates via memory gating and dynamic pruning; (2) a dual-source fine-grained decoder that jointly leverages geometric and appearance cues; and (3) a two-stage curriculum learning strategy coupled with adaptive resolution adjustment to jointly mitigate redundancy and enhance reconstruction fidelity. Evaluated on long-sequence benchmarks, our method significantly outperforms prior streaming approaches—achieving state-of-the-art reconstruction quality while maintaining real-time inference speed (>15 FPS).

Technology Category

Application Category

📝 Abstract
Recent advancements in multi-view scene reconstruction have been significant, yet existing methods face limitations when processing streams of input images. These methods either rely on time-consuming offline optimization or are restricted to shorter sequences, hindering their applicability in real-time scenarios. In this work, we propose LONG3R (LOng sequence streaming 3D Reconstruction), a novel model designed for streaming multi-view 3D scene reconstruction over longer sequences. Our model achieves real-time processing by operating recurrently, maintaining and updating memory with each new observation. We first employ a memory gating mechanism to filter relevant memory, which, together with a new observation, is fed into a dual-source refined decoder for coarse-to-fine interaction. To effectively capture long-sequence memory, we propose a 3D spatio-temporal memory that dynamically prunes redundant spatial information while adaptively adjusting resolution along the scene. To enhance our model's performance on long sequences while maintaining training efficiency, we employ a two-stage curriculum training strategy, each stage targeting specific capabilities. Experiments demonstrate that LONG3R outperforms state-of-the-art streaming methods, particularly for longer sequences, while maintaining real-time inference speed. Project page: https://zgchen33.github.io/LONG3R/.
Problem

Research questions and friction points this paper is trying to address.

Real-time 3D reconstruction from long image streams
Efficient memory handling for extended sequence processing
Dynamic resolution adjustment in 3D spatio-temporal memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recurrent processing for real-time 3D reconstruction
3D spatio-temporal memory for long sequences
Two-stage curriculum training for efficiency
🔎 Similar Papers
No similar papers found.
Z
Zhuoguang Chen
Shanghai Artificial Intelligence Laboratory
M
Minghui Qin
IIIS, Tsinghua University
Tianyuan Yuan
Tianyuan Yuan
Tsinghua University
Computer Vision
Z
Zhe Liu
IIIS, Tsinghua University
H
Hang Zhao
Shanghai Artificial Intelligence Laboratory, IIIS, Tsinghua University, Shanghai Qi Zhi Institute