๐ค AI Summary
Existing methods for long-horizon human motion generation struggle to achieve coherent transitions across semantically distinct motion domains. This work proposes a diffusion-based optimization framework at the inference stage, inspired by stochastic optimal control theory, which explicitly regularizes the transition trajectories of a pretrained diffusion model through a control-energy objective function. For the first time, this approach enables universally controllable generation that supports smooth transitions between motions with diverse styles and semantics. The method significantly enhances the fidelity and temporal consistency of generated motions, making it well-suited for complex long-sequence generation tasks such as choreography.
๐ Abstract
Long-range human movement generation remains a central challenge in computer vision and graphics. Generating coherent transitions across semantically distinct motion domains remains largely unexplored. This capability is particularly important for applications such as dance choreography, where movements must fluidly transition across diverse stylistic and semantic motifs. We propose a simple and effective inference-time optimization framework inspired by diffusion-based stochastic optimal control. Specifically, a control-energy objective that explicitly regularizes the transition trajectories of a pretrained diffusion model. We show that optimizing this objective at inference time yields transitions with fidelity and temporal coherence. This is the first work to provide a general framework for controlled long-range human motion generation with explicit transition modeling.