ExpPortrait: Expressive Portrait Generation via Personalized Representation

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing portrait video generation methods struggle to simultaneously preserve identity consistency, achieve high expressiveness, and maintain temporal stability, primarily due to insufficient disentanglement between facial expressions and identity. This work proposes a high-fidelity personalized head representation that explicitly separates static identity geometry from dynamic expression details. To enhance both identity preservation and expression fidelity, we introduce an expression transfer module. Built upon a Diffusion Transformer (DiT) architecture, our approach conditions the generative process on the proposed head representation to enable end-to-end synthesis of high-quality portrait videos. Experiments demonstrate that our method outperforms current state-of-the-art techniques in both self-driven and cross-identity reenactment tasks, achieving superior performance in identity fidelity, expression accuracy, and temporal stability—particularly excelling at capturing fine-grained details during complex motion.

Technology Category

Application Category

📝 Abstract
While diffusion models have shown great potential in portrait generation, generating expressive, coherent, and controllable cinematic portrait videos remains a significant challenge. Existing intermediate signals for portrait generation, such as 2D landmarks and parametric models, have limited disentanglement capabilities and cannot express personalized details due to their sparse or low-rank representation. Therefore, existing methods based on these models struggle to accurately preserve subject identity and expressions, hindering the generation of highly expressive portrait videos. To overcome these limitations, we propose a high-fidelity personalized head representation that more effectively disentangles expression and identity. This representation captures both static, subject-specific global geometry and dynamic, expression-related details. Furthermore, we introduce an expression transfer module to achieve personalized transfer of head pose and expression details between different identities. We use this sophisticated and highly expressive head model as a conditional signal to train a diffusion transformer (DiT)-based generator to synthesize richly detailed portrait videos. Extensive experiments on self- and cross-reenactment tasks demonstrate that our method outperforms previous models in terms of identity preservation, expression accuracy, and temporal stability, particularly in capturing fine-grained details of complex motion.
Problem

Research questions and friction points this paper is trying to address.

portrait generation
expression preservation
identity disentanglement
personalized representation
cinematic video synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

personalized representation
expression disentanglement
diffusion transformer
portrait video generation
expression transfer
🔎 Similar Papers
No similar papers found.
Junyi Wang
Junyi Wang
University of Electronic Science and Tenchonolegy of China
Image RegistrationMRI
Y
Yudong Guo
University of Science and Technology of China
B
Boyang Guo
University of Science and Technology of China
S
Shengming Yang
University of Science and Technology of China
Juyong Zhang
Juyong Zhang
University of Science and Technology of China
Computer Graphics3D VisionGeometry Processing