🤖 AI Summary
Existing methods struggle to generate human motions that satisfy stringent spatiotemporal constraints, such as complex obstacle avoidance or precise step-count control. This work proposes a training-free, retrieval-guided diffusion noise optimization framework that automatically identifies challenging constraints through relational task parsing and leverages large language models for intelligent reasoning. The approach retrieves potentially feasible motion references from a large-scale motion library and constructs a zero-shot objective function via reward-guided masking to provide high-quality initial noise for the diffusion model. For the first time, this method enables efficient generation of highly constrained human motion sequences without any model training, significantly outperforming existing approaches across multiple challenging tasks.
📝 Abstract
Generating human motion that satisfies customized zero-shot goal functions, enabling applications such as controllable character animation and behavior synthesis for virtual agents, is a critical capability. While current approaches handle many unseen constraints, they fail on tasks with very challenging spatiotemporal restrictions, such as severe spatial obstacles or specified numbers of walking steps. To equip motion generators for these highly constrained tasks, we present a retrieval-guided method built on the training-free diffusion noise optimization framework. The key idea is to search within large motion datasets for guidance that can potentially satisfy difficult constraints. We introduce relational task parsing to group target constraints and identify the difficult ones to be handled by retrieved reference. A better initialization for diffusion noise is then obtained via a reward-guided mask that combines random noise with retrieved noise. By optimizing diffusion noise from this improved initialization, we successfully solve highly constrained generation tasks. By leveraging LLM for relational task parsing, the whole framework is further enabled to automatically reason for what to retrieve, improving the intelligence of moving agents under a training-free optimization scheme.