Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene with Scale-Aware Point Map Reconstruction

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses novel-view synthesis for human-centric scenes from sparse stereo inputs. To tackle geometric instability under low image overlap, we propose a two-stage learning framework: first, scale-aware point map reconstruction coupled with stereo matching refinement improves structural fidelity; second, the reconstructed point map is integrated with Gaussian splatting to enable feedforward free-viewpoint rendering—without requiring dense multi-view inputs or 3D ground-truth supervision. Our method introduces three key innovations: (i) iterative affinity learning for robust correspondence estimation, (ii) point-map-projection-guided Gaussian primitive anchoring, and (iii) self-supervised photometric loss for end-to-end optimization. Evaluated on a newly constructed multi-view human-scene dataset, our approach significantly enhances point map reconstruction robustness and rendered image quality, enabling high-fidelity novel-view synthesis even under large-baseline and extremely sparse settings (e.g., only 2–4 stereo image pairs).

Technology Category

Application Category

📝 Abstract

We present Splat-SAP, a feed-forward approach to render novel views of human-centered scenes from binocular cameras with large sparsity. Gaussian Splatting has shown its promising potential in rendering tasks, but it typically necessitates per-scene optimization with dense input views. Although some recent approaches achieve feed-forward Gaussian Splatting rendering through geometry priors obtained by multi-view stereo, such approaches still require largely overlapped input views to establish the geometry prior. To bridge this gap, we leverage pixel-wise point map reconstruction to represent geometry which is robust to large sparsity for its independent view modeling. In general, we propose a two-stage learning strategy. In stage 1, we transform the point map into real space via an iterative affinity learning process, which facilitates camera control in the following. In stage 2, we project point maps of two input views onto the target view plane and refine such geometry via stereo matching. Furthermore, we anchor Gaussian primitives on this refined plane in order to render high-quality images. As a metric representation, the scale-aware point map in stage 1 is trained in a self-supervised manner without 3D supervision and stage 2 is supervised with photo-metric loss. We collect multi-view human-centered data and demonstrate that our method improves both the stability of point map reconstruction and the visual quality of free-viewpoint rendering.

Problem

Research questions and friction points this paper is trying to address.

Render novel views from sparse binocular inputs

Achieve feed-forward Gaussian Splatting without dense views

Improve geometry reconstruction and rendering for human scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feed-forward Gaussian splatting for human-centered scenes

Scale-aware point map reconstruction without 3D supervision

Two-stage learning with iterative affinity and stereo matching

🔎 Similar Papers

No similar papers found.