FlowR: Flowing from Sparse to Dense 3D Reconstructions

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the severe degradation in 3D reconstruction quality, inconsistent novel view synthesis, and prominent geometric artifacts under sparse-view settings, this paper proposes the first multi-view 3D-consistent generative paradigm based on flow matching. Our method leverages implicit 3D structural priors and employs a multi-view flow matching model to map sparse-reconstruction renderings to a dense-reconstruction target distribution, jointly optimizing view augmentation and geometric reconstruction. We integrate 3D Gaussian splatting rendering with cross-view feature alignment and train on a large-scale synthetic dataset of 3.6 million image pairs. Extensive experiments demonstrate significant improvements over state-of-the-art methods on mainstream novel view synthesis (NVS) benchmarks. The framework achieves real-time inference—45 views at 540×960 resolution—on a single H100 GPU, while maintaining robustness across both sparse and dense scenes. It substantially enhances generation consistency and geometric fidelity.

Technology Category

Application Category

📝 Abstract
3D Gaussian splatting enables high-quality novel view synthesis (NVS) at real-time frame rates. However, its quality drops sharply as we depart from the training views. Thus, dense captures are needed to match the high-quality expectations of some applications, e.g. Virtual Reality (VR). However, such dense captures are very laborious and expensive to obtain. Existing works have explored using 2D generative models to alleviate this requirement by distillation or generating additional training views. These methods are often conditioned only on a handful of reference input views and thus do not fully exploit the available 3D information, leading to inconsistent generation results and reconstruction artifacts. To tackle this problem, we propose a multi-view, flow matching model that learns a flow to connect novel view renderings from possibly sparse reconstructions to renderings that we expect from dense reconstructions. This enables augmenting scene captures with novel, generated views to improve reconstruction quality. Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass. Our pipeline consistently improves NVS in sparse- and dense-view scenarios, leading to higher-quality reconstructions than prior works across multiple, widely-used NVS benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Improving 3D reconstruction quality from sparse views
Reducing dependency on dense captures for VR applications
Enhancing novel view synthesis with multi-view flow matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view flow matching model
Generates dense reconstructions from sparse views
Improves novel view synthesis quality
🔎 Similar Papers
No similar papers found.