LIA-X: Interpretable Latent Portrait Animator

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of fine-grained control and interpretability in facial motion transfer. We propose an interpretable portrait animation method based on a sparse motion dictionary, which constructs semantically aligned, linear motion bases in the latent space. Facial motions from a driving video are disentangled into editable, interpretable sparse factors, enabling a controllable “edit–warp–render” generation paradigm. Our approach employs an autoencoder architecture optimized via large-scale training strategies, supporting billion-parameter model optimization. Extensive evaluations on multiple benchmarks—covering self-reconstruction and cross-identity motion transfer—demonstrate significant improvements over state-of-the-art methods. Moreover, the framework enables high-fidelity user-guided editing and 3D-aware animation generation. It achieves superior control accuracy, semantic transparency, and generalization capability, bridging the gap between expressiveness and interpretability in neural face animation.

Technology Category

Application Category

📝 Abstract
We introduce LIA-X, a novel interpretable portrait animator designed to transfer facial dynamics from a driving video to a source portrait with fine-grained control. LIA-X is an autoencoder that models motion transfer as a linear navigation of motion codes in latent space. Crucially, it incorporates a novel Sparse Motion Dictionary that enables the model to disentangle facial dynamics into interpretable factors. Deviating from previous 'warp-render' approaches, the interpretability of the Sparse Motion Dictionary allows LIA-X to support a highly controllable 'edit-warp-render' strategy, enabling precise manipulation of fine-grained facial semantics in the source portrait. This helps to narrow initial differences with the driving video in terms of pose and expression. Moreover, we demonstrate the scalability of LIA-X by successfully training a large-scale model with approximately 1 billion parameters on extensive datasets. Experimental results show that our proposed method outperforms previous approaches in both self-reenactment and cross-reenactment tasks across several benchmarks. Additionally, the interpretable and controllable nature of LIA-X supports practical applications such as fine-grained, user-guided image and video editing, as well as 3D-aware portrait video manipulation.
Problem

Research questions and friction points this paper is trying to address.

Transfer facial dynamics with fine-grained control
Disentangle facial dynamics into interpretable factors
Enable precise manipulation of facial semantics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoencoder with linear latent motion navigation
Sparse Motion Dictionary for disentangled dynamics
Edit-warp-render strategy for precise facial control
🔎 Similar Papers
No similar papers found.
Yaohui Wang
Yaohui Wang
Research Scientist, Shanghai AI Laboratory | Inria
Machine LearningDeep Generative ModelsVideo Generation
D
Di Yang
Inria, Université Cˆote d’Azur
X
Xinyuan Chen
Shanghai Artificial Intelligence Laboratory
F
Francois Bremond
Inria, Université Cˆote d’Azur
Y
Yu Qiao
Shanghai Artificial Intelligence Laboratory
Antitza Dantcheva
Antitza Dantcheva
Directrice de Recherche, Inria, France
Video generationDeepfake generation and detectionFace analysis for health monitoring and