🤖 AI Summary
This work addresses the challenge of generating high-fidelity character animations from sparse, coarse, and temporally misaligned blocking poses. While existing methods can correct temporal misalignment, they struggle to enhance pose detail while preserving temporal coherence. To this end, we propose a motion refinement approach based on generative diffusion models, featuring a novel inference-time weighted pose-tolerance fusion mechanism that dynamically couples unconditional diffusion outputs with input pose constraints—enabling fine-grained pose completion and temporal retiming without additional training. Our method integrates motion retiming, pose-constraint guidance, and conditional mixing strategies, achieving, for the first time, stable end-to-end generation of natural, fluid animations from coarse keyframes. Experiments demonstrate significant improvements over state-of-the-art detail synthesis methods across diverse complex motions, yielding animations with enhanced naturalness, superior temporal accuracy, and richer joint-level detail.
📝 Abstract
We focus on the problem of using generative diffusion models for the task of motion detailing: converting a rough version of a character animation, represented by a sparse set of coarsely posed, and imprecisely timed blocking poses, into a detailed, natural looking character animation. Current diffusion models can address the problem of correcting the timing of imprecisely timed poses, but we find that no good solution exists for leveraging the diffusion prior to enhance a sparse set of blocking poses with additional pose detail. We overcome this challenge using a simple inference-time trick. At certain diffusion steps, we blend the outputs of an unconditioned diffusion model with input blocking pose constraints using per-blocking-pose tolerance weights, and pass this result in as the input condition to an pre-existing motion retiming model. We find this approach works significantly better than existing approaches that attempt to add detail by blending model outputs or via expressing blocking pose constraints as guidance. The result is the first diffusion model that can robustly convert blocking-level poses into plausible detailed character animations.