PersonaLive! Expressive Portrait Image Animation for Live Streaming

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor real-time performance and instability in long-duration generation of diffusion-based live portrait animation, this paper proposes the first real-time portrait animation framework tailored for ultra-low-latency scenarios. Methodologically: (1) a multi-stage collaborative training paradigm is introduced; (2) hybrid implicit motion control signals are integrated with 3D implicit keypoint modeling to capture dynamic facial expressions; and (3) few-step appearance distillation is combined with an autoregressive micro-block streaming generation mechanism, augmented by sliding-window training and historical keyframe caching. Experiments demonstrate a 7–22× inference speedup over prior diffusion-based methods. The framework maintains high visual fidelity and expressive realism while enabling stable, minute-long continuous generation—achieving, for the first time in diffusion-driven portrait animation, both sub-millisecond latency and minute-scale temporal coherence. This work establishes a new benchmark balancing real-time responsiveness and perceptual expressiveness.

Technology Category

Application Category

📝 Abstract
Current diffusion-based portrait animation models predominantly focus on enhancing visual quality and expression realism, while overlooking generation latency and real-time performance, which restricts their application range in the live streaming scenario. We propose PersonaLive, a novel diffusion-based framework towards streaming real-time portrait animation with multi-stage training recipes. Specifically, we first adopt hybrid implicit signals, namely implicit facial representations and 3D implicit keypoints, to achieve expressive image-level motion control. Then, a fewer-step appearance distillation strategy is proposed to eliminate appearance redundancy in the denoising process, greatly improving inference efficiency. Finally, we introduce an autoregressive micro-chunk streaming generation paradigm equipped with a sliding training strategy and a historical keyframe mechanism to enable low-latency and stable long-term video generation. Extensive experiments demonstrate that PersonaLive achieves state-of-the-art performance with up to 7-22x speedup over prior diffusion-based portrait animation models.
Problem

Research questions and friction points this paper is trying to address.

Reduces latency for real-time portrait animation in live streaming
Improves inference efficiency by eliminating appearance redundancy
Enables stable long-term video generation with low latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid implicit signals for motion control
Fewer-step appearance distillation for efficiency
Autoregressive micro-chunk streaming for low latency
🔎 Similar Papers
No similar papers found.