🤖 AI Summary
This work addresses the fundamental conflict in generative novel view synthesis—namely, the sparsity and inaccuracy of geometric priors coupled with the lack of geometric correspondence in appearance priors—by introducing a structured denoising dynamics mechanism. This approach achieves temporal decoupling and synergistic optimization of geometry and appearance during the diffusion process: early stages leverage geometric priors to establish a coarse structure, while later stages switch to appearance priors to correct geometric inaccuracies and refine fine details. Through a point-cloud-guided geometry-appearance fusion strategy, the method effectively disentangles these two components in both static and dynamic scenes, significantly outperforming existing approaches—particularly under severe point cloud sparsity or distortion—and enabling robust, high-quality novel view synthesis.
📝 Abstract
Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process.MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.