🤖 AI Summary
This work addresses the problem of generating controllable and temporally consistent 4D character animations—i.e., dynamic 3D characters—from a single input character image and a sequence of 2D poses. We propose a neighborhood-constrained 4D Gaussian Splatting optimization framework augmented with a dual-attention module, integrating DiT-based image-to-video priors, camera geometry constraints, and multi-view consistency modeling to achieve spatiotemporally stable reconstructions across views and frames. Key contributions include: (1) a dual-attention mechanism that jointly models pose-appearance correlations and spatiotemporal dependencies; and (2) neighborhood constraints enforcing geometric continuity and identity preservation in the 4D Gaussian representation. Evaluated on our newly introduced Character4D dataset and the benchmark CharacterBench, our method outperforms state-of-the-art approaches, achieving significant improvements in animation coherence, pose fidelity, and cross-view consistency.
📝 Abstract
In this paper, we propose extbf{CharacterShot}, a controllable and consistent 4D character animation framework that enables any individual designer to create dynamic 3D characters (i.e., 4D character animation) from a single reference character image and a 2D pose sequence. We begin by pretraining a powerful 2D character animation model based on a cutting-edge DiT-based image-to-video model, which allows for any 2D pose sequnce as controllable signal. We then lift the animation model from 2D to 3D through introducing dual-attention module together with camera prior to generate multi-view videos with spatial-temporal and spatial-view consistency. Finally, we employ a novel neighbor-constrained 4D gaussian splatting optimization on these multi-view videos, resulting in continuous and stable 4D character representations. Moreover, to improve character-centric performance, we construct a large-scale dataset Character4D, containing 13,115 unique characters with diverse appearances and motions, rendered from multiple viewpoints. Extensive experiments on our newly constructed benchmark, CharacterBench, demonstrate that our approach outperforms current state-of-the-art methods. Code, models, and datasets will be publicly available at https://github.com/Jeoyal/CharacterShot.