LIA-X: Interpretable Latent Portrait Animator

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the lack of fine-grained control and interpretability in facial motion transfer. We propose an interpretable portrait animation method based on a sparse motion dictionary, which constructs semantically aligned, linear motion bases in the latent space. Facial motions from a driving video are disentangled into editable, interpretable sparse factors, enabling a controllable “edit–warp–render” generation paradigm. Our approach employs an autoencoder architecture optimized via large-scale training strategies, supporting billion-parameter model optimization. Extensive evaluations on multiple benchmarks—covering self-reconstruction and cross-identity motion transfer—demonstrate significant improvements over state-of-the-art methods. Moreover, the framework enables high-fidelity user-guided editing and 3D-aware animation generation. It achieves superior control accuracy, semantic transparency, and generalization capability, bridging the gap between expressiveness and interpretability in neural face animation.

Technology Category

Application Category

📝 Abstract

We introduce LIA-X, a novel interpretable portrait animator designed to transfer facial dynamics from a driving video to a source portrait with fine-grained control. LIA-X is an autoencoder that models motion transfer as a linear navigation of motion codes in latent space. Crucially, it incorporates a novel Sparse Motion Dictionary that enables the model to disentangle facial dynamics into interpretable factors. Deviating from previous 'warp-render' approaches, the interpretability of the Sparse Motion Dictionary allows LIA-X to support a highly controllable 'edit-warp-render' strategy, enabling precise manipulation of fine-grained facial semantics in the source portrait. This helps to narrow initial differences with the driving video in terms of pose and expression. Moreover, we demonstrate the scalability of LIA-X by successfully training a large-scale model with approximately 1 billion parameters on extensive datasets. Experimental results show that our proposed method outperforms previous approaches in both self-reenactment and cross-reenactment tasks across several benchmarks. Additionally, the interpretable and controllable nature of LIA-X supports practical applications such as fine-grained, user-guided image and video editing, as well as 3D-aware portrait video manipulation.

Problem

Research questions and friction points this paper is trying to address.

Transfer facial dynamics with fine-grained control

Disentangle facial dynamics into interpretable factors

Enable precise manipulation of facial semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoencoder with linear latent motion navigation

Sparse Motion Dictionary for disentangled dynamics

Edit-warp-render strategy for precise facial control

🔎 Similar Papers

No similar papers found.