Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses the challenges of identity preservation, precise expression transfer, and long-term temporal consistency in free-style portrait animation. To this end, we propose an efficient diffusion-based animation framework built upon Stable Diffusion. Methodologically: (1) expression-aware facial landmarks are introduced as explicit motion driving signals; (2) a fine-grained facial reconstruction loss is coupled with joint expression-mask supervision; and (3) a progressive generation strategy integrated with Taylor interpolation-based caching achieves a 2.6× lossless acceleration. Extensive evaluations on our newly constructed benchmark, EmojiBench++, demonstrate that our approach achieves state-of-the-art performance in animation fidelity, expression controllability, and identity consistency. It robustly supports high-quality, long-sequence animation generation across diverse styles—including realistic human faces, cartoons, sculptures, and animal portraits.

Technology Category

Application Category

📝 Abstract

We present Follow-Your-Emoji-Faster, an efficient diffusion-based framework for freestyle portrait animation driven by facial landmarks. The main challenges in this task are preserving the identity of the reference portrait, accurately transferring target expressions, and maintaining long-term temporal consistency while ensuring generation efficiency. To address identity preservation and accurate expression retargeting, we enhance Stable Diffusion with two key components: a expression-aware landmarks as explicit motion signals, which improve motion alignment, support exaggerated expressions, and reduce identity leakage; and a fine-grained facial loss that leverages both expression and facial masks to better capture subtle expressions and faithfully preserve the reference appearance. With these components, our model supports controllable and expressive animation across diverse portrait types, including real faces, cartoons, sculptures, and animals. However, diffusion-based frameworks typically struggle to efficiently generate long-term stable animation results, which remains a core challenge in this task. To address this, we propose a progressive generation strategy for stable long-term animation, and introduce a Taylor-interpolated cache, achieving a 2.6X lossless acceleration. These two strategies ensure that our method produces high-quality results efficiently, making it user-friendly and accessible. Finally, we introduce EmojiBench++, a more comprehensive benchmark comprising diverse portraits, driving videos, and landmark sequences. Extensive evaluations on EmojiBench++ demonstrate that Follow-Your-Emoji-Faster achieves superior performance in both animation quality and controllability. The code, training dataset and benchmark will be found in https://follow-your-emoji.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Preserving portrait identity while transferring target expressions accurately

Maintaining long-term temporal consistency in animation efficiently

Achieving fine-controllable animation across diverse portrait types

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expression-aware landmarks for motion alignment and identity preservation

Fine-grained facial loss using expression and facial masks

Progressive generation with Taylor-interpolated cache for acceleration

🔎 Similar Papers

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control