🤖 AI Summary
To address identity misalignment caused by positional swapping among multiple characters in animated sequences, this paper proposes the first framework dedicated to Identity Correspondence (IC) modeling and optimization. Methodologically, we introduce the Identity Matching Graph (IMG), a novel weighted complete bipartite graph, to formulate IC correctness as a differentiable graph-structural metric; we further incorporate mask-query attention, identity embedding guidance, multi-scale matching, pre-classification sampling, and graph-structure-aware optimization. Our contributions are threefold: (1) the first dedicated IC evaluation benchmark; (2) state-of-the-art performance—achieving over 40% reduction in IC error rate while significantly improving visual quality; and (3) robust generalization across complex swapping scenarios, empirically validated on diverse animation configurations.
📝 Abstract
Consistent pose-driven character animation has achieved remarkable progress in single-character scenarios. However, extending these advances to multi-character settings is non-trivial, especially when position swap is involved. Beyond mere scaling, the core challenge lies in enforcing correct Identity Correspondence (IC) between characters in reference and generated frames. To address this, we introduce EverybodyDance, a systematic solution targeting IC correctness in multi-character animation. EverybodyDance is built around the Identity Matching Graph (IMG), which models characters in the generated and reference frames as two node sets in a weighted complete bipartite graph. Edge weights, computed via our proposed Mask-Query Attention (MQA), quantify the affinity between each pair of characters. Our key insight is to formalize IC correctness as a graph structural metric and to optimize it during training. We also propose a series of targeted strategies tailored for multi-character animation, including identity-embedded guidance, a multi-scale matching strategy, and pre-classified sampling, which work synergistically. Finally, to evaluate IC performance, we curate the Identity Correspondence Evaluation benchmark, dedicated to multi-character IC correctness. Extensive experiments demonstrate that EverybodyDance substantially outperforms state-of-the-art baselines in both IC and visual fidelity.