DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image-driven human animation methods exhibit notable limitations in fine-grained global controllability, multi-scale adaptability, and long-term temporal consistency. To address these challenges, we propose a highly controllable, expressive, and robust animation generation framework. Our method introduces a novel hybrid motion guidance mechanism integrating implicit facial representations, a 3D head sphere, and a 3D skeletal structure; a progressive multi-resolution training strategy to enhance scale adaptability; and an appearance guidance paradigm that jointly leverages inter-frame motion patterns and visual references. Built upon the diffusion Transformer (DiT), the framework unifies hybrid control signal modeling, frame-wise motion sequence modeling, and complementary visual reference fusion. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art methods across portrait, upper-body, and full-body animation tasks, achieving significant improvements in facial expressiveness, identity fidelity, and temporal coherence over hundreds of frames.

Technology Category

Application Category

📝 Abstract
While recent image-based human animation methods achieve realistic body and facial motion synthesis, critical gaps remain in fine-grained holistic controllability, multi-scale adaptability, and long-term temporal coherence, which leads to their lower expressiveness and robustness. We propose a diffusion transformer (DiT) based framework, DreamActor-M1, with hybrid guidance to overcome these limitations. For motion guidance, our hybrid control signals that integrate implicit facial representations, 3D head spheres, and 3D body skeletons achieve robust control of facial expressions and body movements, while producing expressive and identity-preserving animations. For scale adaptation, to handle various body poses and image scales ranging from portraits to full-body views, we employ a progressive training strategy using data with varying resolutions and scales. For appearance guidance, we integrate motion patterns from sequential frames with complementary visual references, ensuring long-term temporal coherence for unseen regions during complex movements. Experiments demonstrate that our method outperforms the state-of-the-art works, delivering expressive results for portraits, upper-body, and full-body generation with robust long-term consistency. Project Page: https://grisoon.github.io/DreamActor-M1/.
Problem

Research questions and friction points this paper is trying to address.

Achieves fine-grained holistic controllability in human animation
Enhances multi-scale adaptability for varied poses and views
Ensures long-term temporal coherence in complex movements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion transformer framework with hybrid guidance
Hybrid control signals for robust motion
Progressive training for multi-scale adaptation
🔎 Similar Papers
No similar papers found.
Yuxuan Luo
Yuxuan Luo
City University of Hong Kong
Few shot learningZero shot learningContinual learning
Z
Zhengkun Rong
Bytedance Intelligent Creation
L
Lizhen Wang
Bytedance Intelligent Creation
L
Longhao Zhang
Bytedance Intelligent Creation
T
Tianshu Hu
Bytedance Intelligent Creation
Y
Yongming Zhu
Bytedance Intelligent Creation