Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenges of generating long-sequence human animations, where maintaining temporal coherence and visual consistency is difficult, often leading to identity drift and inadequate modeling of long-range dependencies. To this end, the authors propose FrameCache, a novel framework that introduces, for the first time, a training-free causally consistent reference mechanism. FrameCache employs a Screen-Cache-Match (SCM) strategy to construct a dynamic, high-quality reference memory that mitigates identity drift, and incorporates a Trajectory-Aware Autoregressive Generation (TAAG) mechanism to align denoising trajectories across adjacent segments, effectively integrating structural layout with textural details. Combined with overlap-aware latent propagation and a dual-domain fusion strategy, FrameCache seamlessly integrates into various diffusion models and achieves significant improvements in temporal consistency and visual stability on standard benchmarks.

Technology Category

Application Category

📝 Abstract

Human animation aims to generate temporally coherent and visually consistent videos over long sequences, yet modeling long-range dependencies while preserving frame quality remains challenging. Inspired by the human ability to leverage past observations for interpreting ongoing actions, we propose FrameCache, a training-free, causality-consistent reference frame framework. FrameCache explicitly converts historical generation results into causal guidance through two complementary mechanisms. First, at the reference level, a novel Screen-Cache-Match (SCM) strategy constructs a dynamic, high-quality reference memory, ensuring motion-consistent appearance guidance to reduce identity drift. Second, at the generative level, a Trajectory-Aware Autoregressive Generation (TAAG) mechanism aligns denoising trajectories across adjacent video chunks. This is achieved through an overlap-aware latent propagation and a dual-domain fusion strategy that seamlessly blends low-frequency structural layouts with high-frequency textural details. Extensive experiments on standard benchmarks demonstrate that FrameCache consistently improves temporal coherence and visual stability while integrating seamlessly with diverse diffusion baselines. Code will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

human animation

temporal coherence

long-range dependencies

visual consistency

identity drift

Innovation

Methods, ideas, or system contributions that make the work stand out.

FrameCache

Screen-Cache-Match

Trajectory-Aware Autoregressive Generation