๐ค AI Summary
Existing head-swapping methods often suffer from artifacts and distortions in 3D consistency, facial expression naturalness, head modeling, and background integration. This work proposes a novel approach based on a dynamic neural Gaussian portrait prior, elevating 2D portrait videos into a 3D Gaussian feature field embedded within the SMPL-X full-body surfaceโmarking the first application of dynamic neural Gaussian fields to full-head replacement. By leveraging a pre-trained 2D generative model for few-shot domain adaptation and introducing a neural re-rendering strategy for seamless foreground-background blending, the method achieves significant improvements over state-of-the-art techniques in visual fidelity, temporal coherence, identity preservation, and 3D consistency, effectively eliminating artifacts and enhancing realism.
๐ Abstract
We present GSwap, a novel consistent and realistic video head-swapping system empowered by dynamic neural Gaussian portrait priors, which significantly advances the state of the art in face and head replacement. Unlike previous methods that rely primarily on 2D generative models or 3D Morphable Face Models (3DMM), our approach overcomes their inherent limitations, including poor 3D consistency, unnatural facial expressions, and restricted synthesis quality. Moreover, existing techniques struggle with full head-swapping tasks due to insufficient holistic head modeling and ineffective background blending, often resulting in visible artifacts and misalignments. To address these challenges, GSwap introduces an intrinsic 3D Gaussian feature field embedded within a full-body SMPL-X surface, effectively elevating 2D portrait videos into a dynamic neural Gaussian field. This innovation ensures high-fidelity, 3D-consistent portrait rendering while preserving natural head-torso relationships and seamless motion dynamics. To facilitate training, we adapt a pretrained 2D portrait generative model to the source head domain using only a few reference images, enabling efficient domain adaptation. Furthermore, we propose a neural re-rendering strategy that harmoniously integrates the synthesized foreground with the original background, eliminating blending artifacts and enhancing realism. Extensive experiments demonstrate that GSwap surpasses existing methods in multiple aspects, including visual quality, temporal coherence, identity preservation, and 3D consistency.