FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing diffusion-based acceleration methods for long portrait animation suffer from severe identity (ID) consistency degradation. To address this, we propose an end-to-end framework for infinite-length video generation. Our method builds upon a video diffusion Transformer augmented with a pre-trained facial feature extractor. Key contributions include: (1) a normalized facial expression module that explicitly disentangles and stabilizes ID and expression; (2) an adaptive higher-order latent derivative prediction mechanism to enhance temporal modeling fidelity; and (3) a dynamic sliding window strategy with overlapping-region weighted fusion for efficient local-global collaborative inference. Evaluated on standard benchmarks, our framework achieves 6× inference speedup over baseline diffusion models while significantly outperforming state-of-the-art methods in both ID preservation and motion coherence. Quantitative metrics and qualitative assessments consistently demonstrate superior performance.

Technology Category

Application Category

📝 Abstract

Current diffusion-based acceleration methods for long-portrait animation struggle to ensure identity (ID) consistency. This paper presents FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6x acceleration in inference speed. In particular, FlashPortrait begins by computing the identity-agnostic facial expression features with an off-the-shelf extractor. It then introduces a Normalized Facial Expression Block to align facial features with diffusion latents by normalizing them with their respective means and variances, thereby improving identity stability in facial modeling. During inference, FlashPortrait adopts a dynamic sliding-window scheme with weighted blending in overlapping areas, ensuring smooth transitions and ID consistency in long animations. In each context window, based on the latent variation rate at particular timesteps and the derivative magnitude ratio among diffusion layers, FlashPortrait utilizes higher-order latent derivatives at the current timestep to directly predict latents at future timesteps, thereby skipping several denoising steps and achieving 6x speed acceleration. Experiments on benchmarks show the effectiveness of FlashPortrait both qualitatively and quantitatively.

Problem

Research questions and friction points this paper is trying to address.

Ensuring identity consistency in long-portrait animation

Accelerating infinite-length video synthesis up to 6x faster

Improving facial modeling stability with normalized expression features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive latent prediction with higher-order derivatives

Dynamic sliding-window scheme for smooth transitions

Normalized Facial Expression Block aligning facial features

🔎 Similar Papers

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control