Learn Your Scales: Towards Scale-Consistent Generative Novel View Synthesis

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses scale inconsistency in single-image-driven generative novel view synthesis (GNVS), caused by inherent scene-scale ambiguity. We present the first systematic analysis of how scale uncertainty degrades model performance. To tackle this, we propose a learnable joint scale estimation framework that integrates end-to-end differentiable scale prediction, generative adversarial modeling, and scale-invariance constraints—enabling geometry-aware synthesis. Unlike conventional normalization strategies, our approach avoids geometric distortion and reduces training complexity. We further introduce a novel quantitative metric to measure scale inconsistency across generated views. Experiments demonstrate that our method significantly reduces scale deviation, yielding substantial improvements in both visual quality and geometric consistency of synthesized views.

Technology Category

Application Category

📝 Abstract
Conventional depth-free multi-view datasets are captured using a moving monocular camera without metric calibration. The scales of camera positions in this monocular setting are ambiguous. Previous methods have acknowledged scale ambiguity in multi-view data via various ad-hoc normalization pre-processing steps, but have not directly analyzed the effect of incorrect scene scales on their application. In this paper, we seek to understand and address the effect of scale ambiguity when used to train generative novel view synthesis methods (GNVS). In GNVS, new views of a scene or object can be minimally synthesized given a single image and are, thus, unconstrained, necessitating the use of generative methods. The generative nature of these models captures all aspects of uncertainty, including any uncertainty of scene scales, which act as nuisance variables for the task. We study the effect of scene scale ambiguity in GNVS when sampled from a single image by isolating its effect on the resulting models and, based on these intuitions, define new metrics that measure the scale inconsistency of generated views. We then propose a framework to estimate scene scales jointly with the GNVS model in an end-to-end fashion. Empirically, we show that our method reduces the scale inconsistency of generated views without the complexity or downsides of previous scale normalization methods. Further, we show that removing this ambiguity improves generated image quality of the resulting GNVS model.
Problem

Research questions and friction points this paper is trying to address.

Addresses scale ambiguity in generative novel view synthesis.
Proposes metrics to measure scale inconsistency in generated views.
Introduces end-to-end framework for joint scene scale estimation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end scene scale estimation with GNVS
New metrics for measuring scale inconsistency
Reduced scale ambiguity improves image quality
🔎 Similar Papers
No similar papers found.