Towards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimization

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing methods struggle to generate human motions that satisfy stringent spatiotemporal constraints, such as complex obstacle avoidance or precise step-count control. This work proposes a training-free, retrieval-guided diffusion noise optimization framework that automatically identifies challenging constraints through relational task parsing and leverages large language models for intelligent reasoning. The approach retrieves potentially feasible motion references from a large-scale motion library and constructs a zero-shot objective function via reward-guided masking to provide high-quality initial noise for the diffusion model. For the first time, this method enables efficient generation of highly constrained human motion sequences without any model training, significantly outperforming existing approaches across multiple challenging tasks.

📝 Abstract

Generating human motion that satisfies customized zero-shot goal functions, enabling applications such as controllable character animation and behavior synthesis for virtual agents, is a critical capability. While current approaches handle many unseen constraints, they fail on tasks with very challenging spatiotemporal restrictions, such as severe spatial obstacles or specified numbers of walking steps. To equip motion generators for these highly constrained tasks, we present a retrieval-guided method built on the training-free diffusion noise optimization framework. The key idea is to search within large motion datasets for guidance that can potentially satisfy difficult constraints. We introduce relational task parsing to group target constraints and identify the difficult ones to be handled by retrieved reference. A better initialization for diffusion noise is then obtained via a reward-guided mask that combines random noise with retrieved noise. By optimizing diffusion noise from this improved initialization, we successfully solve highly constrained generation tasks. By leveraging LLM for relational task parsing, the whole framework is further enabled to automatically reason for what to retrieve, improving the intelligence of moving agents under a training-free optimization scheme.

Problem

Research questions and friction points this paper is trying to address.

human motion generation

highly-constrained tasks

spatiotemporal constraints

zero-shot goal functions

controllable animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-guided diffusion

noise optimization

relational task parsing