🤖 AI Summary
Diffusion models in view synthesis often suffer from detail blurring and structural distortions due to pixel-to-latent space compression and diffusion-induced hallucinations. This work presents the first systematic analysis of such degradation mechanisms across spatial, temporal, and backbone dimensions, and introduces a general reference-guided restoration framework. Leveraging a coarse-to-fine strategy, the framework performs reference pre-alignment, global structure anchoring, and local detail injection to simultaneously correct structural inaccuracies and enhance fine-grained details. Designed as a plug-and-play module, it enables zero-shot correction of diverse degradation types without task-specific training. Extensive experiments demonstrate its superior performance over state-of-the-art methods in both novel view synthesis and stereo conversion tasks, effectively mitigating a wide range of diffusion artifacts.
📝 Abstract
With the recent surge of generative models, diffusion-based approaches have become mainstream for view synthesis tasks, either in an explicit depth-warp-inpaint or in an implicit end-to-end manner. Despite their success, both paradigms often suffer from noticeable quality degradation, e.g., blurred details and distorted structures, caused by pixel-to-latent compression and diffusion hallucination. In this paper, we investigate diffusion degradation from three key dimensions (i.e., spatial, temporal, and backbone-related) and propose UniFixer, a universal reference-guided framework that fixes diverse degradation artifacts via a coarse-to-fine strategy. Specifically, a reference pre-alignment module is first designed to perform coarse alignment between the reference view and the degraded novel view. A global structure anchoring mechanism then rectifies geometric distortions to ensure structural fidelity, followed by a local detail injection module that recovers fine-grained texture details for high-quality view synthesis. Our UniFixer serves as a plug-and-play refiner that achieves zero-shot fixing across different types of diffusion degradation, and extensive experiments verify our state-of-the-art performance on novel view synthesis and stereo conversion.