EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
Existing methods for long-duration human animation generation often suffer from degraded visual quality and inconsistent character identity due to their segmented processing strategy. To address this, this work proposes a persistent latent context memory mechanism that anchors identity and motion features across segments, combined with a restorative implicit flow matching approach to refine the sampling process and effectively suppress temporal accumulation drift. Requiring only lightweight LoRA fine-tuning, the method achieves substantial performance gains in generating videos ranging from 10 to 90 seconds: PSNR and SSIM improve by up to 15%, while LPIPS and FID decrease by as much as 32%, significantly outperforming current state-of-the-art approaches.
📝 Abstract
We propose EverAnimate, an efficient post-training method for long-horizon animated video generation that preserves visual quality and character identity. Long-form animation remains challenging because highly dynamic human motion must be synthesized against relatively static environments, making chunk-based generation prone to accumulated drift: (i) low-level quality drift, such as progressive degradation of static backgrounds, and (ii) high-level semantic drift, such as inconsistent character identity and view-dependent attributes. To address this issue, EverAnimate restores drifted flow trajectories by anchoring generation to a persistent latent context memory, consisting of two complementary mechanisms. (i) Persistent Latent Propagation maintains a context memory across chunks to propagate identity and motion in latent space while mitigating temporal forgetting. (ii) Restorative Flow Matching introduces an implicit restoration objective during sampling through velocity adjustment, improving within-chunk fidelity. With only lightweight LoRA tuning, EverAnimate outperforms state-of-the-art long-animation methods in both short- and long-horizon settings: at 10 seconds, it improves PSNR/SSIM by 8%/7% and reduces LPIPS/FID by 22%/11%; at 90 seconds, the gains increase to 15%/15% and 32%/27%, respectively.
Problem

Research questions and friction points this paper is trying to address.

long-horizon animation
human motion synthesis
accumulated drift
character identity consistency
video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent flow restoration
persistent latent propagation
restorative flow matching
long-horizon animation
identity consistency