π€ AI Summary
This paper addresses the challenging problem of automatic 3D animation generation from 2D hand-drawn storyboards. We propose the first end-to-end cross-domain alignment framework for this task. Methodologically, we introduce a novel alignment mechanism that projects 2D sketches and 3D motion sequences into a shared latent embedding space. A multi-condition motion generator is designed to jointly leverage 3D key poses, joint trajectories, and action-semantic text as controllable inputs. Building upon motion diffusion models, we develop a conditional generation architecture augmented with a neural mapping module to explicitly enforce sketchβmotion semantic consistency. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods across multiple benchmarks. A user study confirms that the generated animations faithfully reflect sketch intent, achieving high visual fidelity and fine-grained editability.
π Abstract
Storyboarding is widely used for creating 3D animations. Animators use the 2D sketches in storyboards as references to craft the desired 3D animations through a trial-and-error process. The traditional approach requires exceptional expertise and is both labor-intensive and time-consuming. Consequently, there is a high demand for automated methods that can directly translate 2D storyboard sketches into 3D animations. This task is under-explored to date and inspired by the significant advancements of motion diffusion models, we propose to address it from the perspective of conditional motion synthesis. We thus present Sketch2Anim, composed of two key modules for sketch constraint understanding and motion generation. Specifically, due to the large domain gap between the 2D sketch and 3D motion, instead of directly conditioning on 2D inputs, we design a 3D conditional motion generator that simultaneously leverages 3D keyposes, joint trajectories, and action words, to achieve precise and fine-grained motion control. Then, we invent a neural mapper dedicated to aligning user-provided 2D sketches with their corresponding 3D keyposes and trajectories in a shared embedding space, enabling, for the first time, direct 2D control of motion generation. Our approach successfully transfers storyboards into high-quality 3D motions and inherently supports direct 3D animation editing, thanks to the flexibility of our multi-conditional motion generator. Comprehensive experiments and evaluations, and a user perceptual study demonstrate the effectiveness of our approach.