π€ AI Summary
Addressing the challenge of generating high-fidelity, view-consistent 360Β° surround views from a single portrait image, this paper proposes DiffPortrait3Dβthe first diffusion-based generative framework explicitly designed for 360Β° geometric and appearance consistency. Methodologically, it introduces a ControlNet-guided posterior head synthesis module, a dual-branch appearance modeling architecture to ensure global geometric and textural coherence between front and back views, and integrates continuous-view sequence training with posterior-head reference image constraints. It achieves, for the first time, real-time, free-viewpoint NeRF rendering driven solely by a single input image. Extensive evaluation demonstrates substantial improvements over state-of-the-art methods on complex portraits (e.g., with eyewear or headwear), yielding locally smooth, globally consistent geometry and texture. The generated outputs directly instantiate high-quality NeRF models, enabling immersive telepresence and personalized content creation.
π Abstract
Generating high-quality 360-degree views of human heads from single-view images is essential for enabling accessible immersive telepresence applications and scalable personalized content creation. While cutting-edge methods for full head generation are limited to modeling realistic human heads, the latest diffusion-based approaches for style-omniscient head synthesis can produce only frontal views and struggle with view consistency, preventing their conversion into true 3D models for rendering from arbitrary angles. We introduce a novel approach that generates fully consistent 360-degree head views, accommodating human, stylized, and anthropomorphic forms, including accessories like glasses and hats. Our method builds on the DiffPortrait3D framework, incorporating a custom ControlNet for back-of-head detail generation and a dual appearance module to ensure global front-back consistency. By training on continuous view sequences and integrating a back reference image, our approach achieves robust, locally continuous view synthesis. Our model can be used to produce high-quality neural radiance fields (NeRFs) for real-time, free-viewpoint rendering, outperforming state-of-the-art methods in object synthesis and 360-degree head generation for very challenging input portraits.