Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the challenges of generating long-sequence human animations, where maintaining temporal coherence and visual consistency is difficult, often leading to identity drift and inadequate modeling of long-range dependencies. To this end, the authors propose FrameCache, a novel framework that introduces, for the first time, a training-free causally consistent reference mechanism. FrameCache employs a Screen-Cache-Match (SCM) strategy to construct a dynamic, high-quality reference memory that mitigates identity drift, and incorporates a Trajectory-Aware Autoregressive Generation (TAAG) mechanism to align denoising trajectories across adjacent segments, effectively integrating structural layout with textural details. Combined with overlap-aware latent propagation and a dual-domain fusion strategy, FrameCache seamlessly integrates into various diffusion models and achieves significant improvements in temporal consistency and visual stability on standard benchmarks.

Technology Category

Application Category

📝 Abstract
Human animation aims to generate temporally coherent and visually consistent videos over long sequences, yet modeling long-range dependencies while preserving frame quality remains challenging. Inspired by the human ability to leverage past observations for interpreting ongoing actions, we propose FrameCache, a training-free, causality-consistent reference frame framework. FrameCache explicitly converts historical generation results into causal guidance through two complementary mechanisms. First, at the reference level, a novel Screen-Cache-Match (SCM) strategy constructs a dynamic, high-quality reference memory, ensuring motion-consistent appearance guidance to reduce identity drift. Second, at the generative level, a Trajectory-Aware Autoregressive Generation (TAAG) mechanism aligns denoising trajectories across adjacent video chunks. This is achieved through an overlap-aware latent propagation and a dual-domain fusion strategy that seamlessly blends low-frequency structural layouts with high-frequency textural details. Extensive experiments on standard benchmarks demonstrate that FrameCache consistently improves temporal coherence and visual stability while integrating seamlessly with diverse diffusion baselines. Code will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

human animation
temporal coherence
long-range dependencies
visual consistency
identity drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

FrameCache
Screen-Cache-Match
Trajectory-Aware Autoregressive Generation
causality-consistent
training-free
Jianan Wang
Jianan Wang
Astribot / IDEA / Deepmind / Oxford
Computer VisionGenerative AIRoboticsLearning Theory
N
Nailei Hei
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China
L
Li He
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China
H
Huanzhen Wang
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China
A
Aoxing Li
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China
Y
Yingkai Zhao
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China
Yuxuan Lin
Yuxuan Lin
College of Computer Science and Artificial Intelligence, Fudan University
Computer VisionMultimodal LearningEmbodied AI
Haofen Wang
Haofen Wang
Tongji University
Knowledge GraphNatural Language ProcessingRetrieval Augmented Generation
C
Chunyang Wang
School of Data Science and Engineering, East China Normal University, Shanghai, China
Yan Wang
Yan Wang
Professor in East China Normal University
computer visionmedical image analysis
Wenqiang Zhang
Wenqiang Zhang
School of Computer Science, Fudan University
RoboticMedical ImageComputer vision