🤖 AI Summary
Traditional character animation relies on dense spatiotemporal specifications—such as pelvis trajectories and per-frame temporal annotations—resulting in rigid control and high editing overhead. To address this, we propose a sparse, time-agnostic key-joint control paradigm that drives full-body motion using only a minimal set of end-effector position signals. Our method introduces a two-stage decoupled diffusion framework: the first stage completes sparse key-joint trajectories, while the second synthesizes physically plausible and functionally coherent full-body motions. We further design a time-agnostic control encoding scheme and integrate functional constraint embeddings to ensure task-aware motion generation. Extensive evaluation across multiple datasets and complex scenarios demonstrates significant improvements in control intuitiveness, editing flexibility, and goal-directed accuracy—without requiring frame-level temporal annotations. This work establishes a novel paradigm for expressive, controllable animation synthesis.
📝 Abstract
Creating expressive character animations is labor-intensive, requiring intricate manual adjustment of animators across space and time. Previous works on controllable motion generation often rely on a predefined set of dense spatio-temporal specifications (e.g., dense pelvis trajectories with exact per-frame timing), limiting practicality for animators. To process high-level intent and intuitive control in diverse scenarios, we propose a practical controllable motions synthesis framework that respects sparse and flexible keyjoint signals. Our approach employs a decomposed diffusion-based motion synthesis framework that first synthesizes keyjoint movements from sparse input control signals and then synthesizes full-body motion based on the completed keyjoint trajectories. The low-dimensional keyjoint movements can easily adapt to various control signal types, such as end-effector position for diverse goal-driven motion synthesis, or incorporate functional constraints on a subset of keyjoints. Additionally, we introduce a time-agnostic control formulation, eliminating the need for frame-specific timing annotations and enhancing control flexibility. Then, the shared second stage can synthesize a natural whole-body motion that precisely satisfies the task requirement from dense keyjoint movements. We demonstrate the effectiveness of sparse and flexible keyjoint control through comprehensive experiments on diverse datasets and scenarios.