๐ค AI Summary
This paper addresses cross-identity facial pose and expression transfer. We propose a self-supervised method operating in the StyleGAN2 latent space, employing a dual-encoderโmapping architecture: a source-image encoder extracts pose and expression representations, while a target-image encoder captures identity features; these are fused via a latent-space mapping network to drive the StyleGAN2 generator for reconstructing the target identity under novel pose and expression. Training is fully unsupervised, leveraging only inter-frame consistency from unlabeled video sequences. Our key contributions are: (1) the first disentanglement-aware dual-path latent mapping mechanism for controllable editing; (2) arbitrary-identity, fine-grained pose/expression transfer with explicit control; and (3) high-fidelity synthesis at near-real-time inference speed. Extensive experiments demonstrate superior qualitative and quantitative performance over existing unsupervised approaches.
๐ Abstract
We propose a method to transfer pose and expression between face images. Given a source and target face portrait, the model produces an output image in which the pose and expression of the source face image are transferred onto the target identity. The architecture consists of two encoders and a mapping network that projects the two inputs into the latent space of StyleGAN2, which finally generates the output. The training is self-supervised from video sequences of many individuals. Manual labeling is not required. Our model enables the synthesis of random identities with controllable pose and expression. Close-to-real-time performance is achieved.