DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing character animation methods face a trade-off between identity preservation and motion consistency and rely on explicit pose priors, limiting their generalization to non-humanoid characters. This work proposes a universal animation framework that dispenses with explicit pose representations by formulating motion conditioning as a spatiotemporal context learning problem. Through a two-stage paradigm, the method fuses reference images and driving motions into a unified latent space and enables end-to-end RGB-based animation via bootstrapped pseudo-label data synthesis. By leveraging foundation models to generate motion priors and pseudo cross-identity training pairs, the approach substantially enhances cross-domain generalization for high-fidelity animation. Evaluated on the newly introduced AW Bench benchmark, the proposed method achieves state-of-the-art performance in both visual fidelity and generalization capability.

Technology Category

Application Category

📝 Abstract

Character image animation aims to synthesize high-fidelity videos by transferring motion from a driving sequence to a static reference image. Despite recent advancements, existing methods suffer from two fundamental challenges: (1) suboptimal motion injection strategies that lead to a trade-off between identity preservation and motion consistency, manifesting as a"see-saw", and (2) an over-reliance on explicit pose priors (e.g., skeletons), which inadequately capture intricate dynamics and hinder generalization to arbitrary, non-humanoid characters. To address these challenges, we present DreamActor-M2, a universal animation framework that reimagines motion conditioning as an in-context learning problem. Our approach follows a two-stage paradigm. First, we bridge the input modality gap by fusing reference appearance and motion cues into a unified latent space, enabling the model to jointly reason about spatial identity and temporal dynamics by leveraging the generative prior of foundational models. Second, we introduce a self-bootstrapped data synthesis pipeline that curates pseudo cross-identity training pairs, facilitating a seamless transition from pose-dependent control to direct, end-to-end RGB-driven animation. This strategy significantly enhances generalization across diverse characters and motion scenarios. To facilitate comprehensive evaluation, we further introduce AW Bench, a versatile benchmark encompassing a wide spectrum of characters types and motion scenarios. Extensive experiments demonstrate that DreamActor-M2 achieves state-of-the-art performance, delivering superior visual fidelity and robust cross-domain generalization. Project Page: https://grisoon.github.io/DreamActor-M2/

Problem

Research questions and friction points this paper is trying to address.

character image animation

motion injection

identity preservation

pose priors

cross-domain generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

spatiotemporal in-context learning

pose-free animation

universal character animation

self-bootstrapped data synthesis

foundation model prior

🔎 Similar Papers

No similar papers found.