🤖 AI Summary
Sparse-view 3D scene reconstruction suffers from inaccurate localization in low-texture or repetitive regions, severe artifacts, and local noise coupled with global inconsistency—primarily due to unreliable feature matching in the absence of ground-truth optical flow supervision. To address these challenges, we propose a probabilistic joint optical flow–depth optimization framework. Our method introduces a novel pixel-wise matching probability-driven co-optimization mechanism that dynamically fuses flow and depth information via uncertainty-aware weighting. We further design a reliability-aware multi-view depth consistency loss that suppresses gradient bias from uncertain regions while preserving effective geometric supervision. Built upon Gaussian splatting representations, our approach tightly integrates uncertainty estimation in correspondence matching with explicit geometric consistency modeling. Extensive experiments on RealEstate10K and ACID demonstrate significant improvements over state-of-the-art methods, particularly in challenging low-texture and repetitive regions, yielding superior reconstruction accuracy and novel-view synthesis quality.
📝 Abstract
Reconstructing 3D scenes from sparse viewpoints is a long-standing challenge with wide applications. Recent advances in feed-forward 3D Gaussian sparse-view reconstruction methods provide an efficient solution for real-time novel view synthesis by leveraging geometric priors learned from large-scale multi-view datasets and computing 3D Gaussian centers via back-projection. Despite offering strong geometric cues, both feed-forward multi-view depth estimation and flow-depth joint estimation face key limitations: the former suffers from mislocation and artifact issues in low-texture or repetitive regions, while the latter is prone to local noise and global inconsistency due to unreliable matches when ground-truth flow supervision is unavailable. To overcome this, we propose JointSplat, a unified framework that leverages the complementarity between optical flow and depth via a novel probabilistic optimization mechanism. Specifically, this pixel-level mechanism scales the information fusion between depth and flow based on the matching probability of optical flow during training. Building upon the above mechanism, we further propose a novel multi-view depth-consistency loss to leverage the reliability of supervision while suppressing misleading gradients in uncertain areas. Evaluated on RealEstate10K and ACID, JointSplat consistently outperforms state-of-the-art (SOTA) methods, demonstrating the effectiveness and robustness of our proposed probabilistic joint flow-depth optimization approach for high-fidelity sparse-view 3D reconstruction.