CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

πŸ“… 2026-01-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing methods struggle to generate animations for multiple agents with arbitrary numbers, types, and spatial arrangements, particularly when spatial misalignment exists between reference images and driving poses. To address this, this work proposes CoDance, a novel framework that introduces the β€œUnbind-Rebind” paradigm: a pose offset encoder decouples poses from their rigid spatial correspondence with reference images, while pose random perturbation enables learning of position-invariant motion representations. Dual guidance from textual semantics and agent-specific masks ensures accurate agent association and motion redirection. Evaluated on both the newly introduced CoDanceBench and existing datasets, CoDance achieves state-of-the-art performance, demonstrating significantly enhanced generalization across diverse agent types and complex spatial layouts.

Technology Category

Application Category

πŸ“ Abstract
Character image animation is gaining significant importance across various domains, driven by the demand for robust and flexible multi-subject rendering. While existing methods excel in single-person animation, they struggle to handle arbitrary subject counts, diverse character types, and spatial misalignment between the reference image and the driving poses. We attribute these limitations to an overly rigid spatial binding that forces strict pixel-wise alignment between the pose and reference, and an inability to consistently rebind motion to intended subjects. To address these challenges, we propose CoDance, a novel Unbind-Rebind framework that enables the animation of arbitrary subject counts, types, and spatial configurations conditioned on a single, potentially misaligned pose sequence. Specifically, the Unbind module employs a novel pose shift encoder to break the rigid spatial binding between the pose and the reference by introducing stochastic perturbations to both poses and their latent features, thereby compelling the model to learn a location-agnostic motion representation. To ensure precise control and subject association, we then devise a Rebind module, leveraging semantic guidance from text prompts and spatial guidance from subject masks to direct the learned motion to intended characters. Furthermore, to facilitate comprehensive evaluation, we introduce a new multi-subject CoDanceBench. Extensive experiments on CoDanceBench and existing datasets show that CoDance achieves SOTA performance, exhibiting remarkable generalization across diverse subjects and spatial layouts. The code and weights will be open-sourced.
Problem

Research questions and friction points this paper is trying to address.

multi-subject animation
spatial misalignment
character image animation
arbitrary subject counts
motion rebind
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unbind-Rebind paradigm
pose shift encoder
location-agnostic motion representation
multi-subject animation
semantic-spatial guidance
πŸ”Ž Similar Papers
No similar papers found.