🤖 AI Summary
This work proposes NVB-Face, an end-to-end single-stage framework for generating multi-view faces from a single low-quality blind face image, addressing the limitations of existing two-stage approaches that often suffer from view inconsistency and distortion due to error propagation in cascaded reconstruction. Unlike conventional methods, NVB-Face directly extracts features from the input image and employs a feature manipulator to construct a 3D-aware latent representation that encodes multi-view geometry. A diffusion model then synthesizes high-fidelity, viewpoint-consistent novel views in a single step. By eliminating the intermediate reconstruction stage, the proposed method achieves superior performance in both visual fidelity and cross-view consistency compared to state-of-the-art two-stage techniques.
📝 Abstract
We propose a novel one-stage method, NVB-Face, for generating consistent Novel-View images directly from a single Blind Face image. Existing approaches to novel-view synthesis for objects or faces typically require a high-resolution RGB image as input. When dealing with degraded images, the conventional pipeline follows a two-stage process: first restoring the image to high resolution, then synthesizing novel views from the restored result. However, this approach is highly dependent on the quality of the restored image, often leading to inaccuracies and inconsistencies in the final output. To address this limitation, we extract single-view features directly from the blind face image and introduce a feature manipulator that transforms these features into 3D-aware, multi-view latent representations. Leveraging the powerful generative capacity of a diffusion model, our framework synthesizes high-quality, consistent novel-view face images. Experimental results show that our method significantly outperforms traditional two-stage approaches in both consistency and fidelity.