MultiAnimate: Pose-Guided Image Animation Made Extensible

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pose-guided image animation methods often suffer from identity confusion and implausible occlusions in multi-character scenes, limiting their scalability. This work proposes a scalable multi-character animation framework based on a Diffusion Transformer (DiT), which jointly models individual identities, spatial positions, and inter-character relationships through a coordinated identifier allocator and adapter module. By integrating a mask-driven strategy with a scalable training mechanism, the method achieves, for the first time, generalization to an arbitrary number of characters using only two-character training data. The approach significantly outperforms existing diffusion-based baselines on multi-character image animation and successfully generalizes to complex, unseen multi-character scenarios beyond the training distribution.

Technology Category

Application Category

📝 Abstract
Pose-guided human image animation aims to synthesize realistic videos of a reference character driven by a sequence of poses. While diffusion-based methods have achieved remarkable success, most existing approaches are limited to single-character animation. We observe that naively extending these methods to multi-character scenarios often leads to identity confusion and implausible occlusions between characters. To address these challenges, in this paper, we propose an extensible multi-character image animation framework built upon modern Diffusion Transformers (DiTs) for video generation. At its core, our framework introduces two novel components-Identifier Assigner and Identifier Adapter - which collaboratively capture per-person positional cues and inter-person spatial relationships. This mask-driven scheme, along with a scalable training strategy, not only enhances flexibility but also enables generalization to scenarios with more characters than those seen during training. Remarkably, trained on only a two-character dataset, our model generalizes to multi-character animation while maintaining compatibility with single-character cases. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in multi-character image animation, surpassing existing diffusion-based baselines.
Problem

Research questions and friction points this paper is trying to address.

pose-guided animation
multi-character animation
identity confusion
occlusion handling
diffusion-based video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-character animation
Diffusion Transformers
Identifier Assigner
Identifier Adapter
Pose-guided generation
🔎 Similar Papers
No similar papers found.
Y
Yingcheng Hu
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; ShanghaiTech University; Shanghai Jiao Tong University
H
Haowen Gong
Shanghai Jiao Tong University
Chuanguang Yang
Chuanguang Yang
Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionKnowledge DistillationRepresentation Learning
Zhulin An
Zhulin An
Institute Of Computing Technology Chinese Academy Of Sciences
Automatic Deep LearningLifelong Learning
Y
Yongjun Xu
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
Songhua Liu
Songhua Liu
Shanghai Jiao Tong University
Computer VisionMachine Learning