Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing single-view conditional 3D generation models suffer from temporal inconsistency when processing continuous visual streams due to frame-wise independent synthesis. This work proposes a training-free, plug-and-play streaming 3D generation method that achieves long-term consistency without architectural modifications or additional losses. By introducing a compact memory bank dynamically updated via evidence scores, the approach maintains constant memory overhead while enabling coherent 3D reconstruction over time. It integrates a frozen view-conditioned 3D generator with a streaming multi-view fusion mechanism, facilitating seamless integration into existing pipelines. Evaluated on both real-world and synthetic streaming datasets, the method significantly outperforms baseline approaches—including key-value cache reuse and optical flow–based feature editing—in terms of photometric and geometric fidelity.

📝 Abstract

View-conditioned 3D generators such as SAM 3D, TRELLIS and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that turns a frozen view-conditioned 3D generator into a streaming generator with constant cross-chunk memory. Stream3D achieves this by maintaining a compact evidential memory, which selectively caches the most informative historical frames based on a proposed evidence score mechanism. As the stream progresses, the memory dynamically updates to retain a fixed number of informative frames, preventing the memory footprint from growing linearly with sequence length. This also prevents degradation over long sequences and keeps the underlying generator completely unchanged without retraining, architectural modifications, or auxiliary losses. Evaluated on both realistic and synthetic streaming benchmarks, Stream3D outperforms latent-transport baselines, including KV-cache reuse and flow-based feature editing, across both photometric and geometric metrics. More details can be found at: https://anonymous-submission-20.github.io/streaming3D.github.io/.

Problem

Research questions and friction points this paper is trying to address.

temporal inconsistency

streaming 3D generation

view-conditioned 3D generator

monocular video stream

sequential multi-view

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stream3D

evidential memory

streaming 3D generation