One View Is Enough! Monocular Training for In-the-Wild Novel View Generation

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of monocular novel view synthesis, which typically relies on multi-view image supervision and thus constrains data scale and diversity. The authors propose OVIE, the first method capable of training solely on unpaired in-the-wild single images without any multi-view supervision. During training, a geometric scaffold is constructed via monocular depth estimation to lift the source image into 3D space and project it to generate pseudo-target views, while an occlusion-aware mask handles invisible regions. Notably, at inference time, the model operates without explicit depth or 3D representations, significantly enhancing both generalization and efficiency. Evaluated in a zero-shot setting, OVIE outperforms existing approaches, achieves a 600× speedup over the second-best baseline, and enables large-scale training on up to 30 million unlabeled images.

Technology Category

Application Category

📝 Abstract
Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie.
Problem

Research questions and friction points this paper is trying to address.

monocular novel-view synthesis
multi-view supervision
in-the-wild images
training data limitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

monocular novel-view synthesis
unpaired training
masked training
geometry-free inference
in-the-wild images
🔎 Similar Papers
No similar papers found.