🤖 AI Summary
This work addresses the challenges of efficiently generating multi-agent motions in scenarios with variable numbers of agents, where existing approaches suffer from error accumulation and high computational costs due to autoregressive modeling. The authors propose the Unified Motion Flow (UMF) framework—the first method capable of text-driven multi-human motion generation without requiring a predefined number of agents. UMF decouples motion synthesis into a single prior generation step followed by multiple reactive generation steps, enabling joint training of heterogeneous data within a unified latent space. It further introduces Pyramid Motion Flow (P-Flow) and Semi-noisy Motion Flow (S-Flow) mechanisms to support hierarchical, conditional, and efficient motion synthesis. Experimental and user studies demonstrate that UMF significantly improves generation quality, generalization, and computational efficiency, establishing its superiority as a general-purpose model for multi-human motion generation.
📝 Abstract
Generative models excel at motion synthesis for a fixed number of agents but struggle to generalize with variable agents. Based on limited, domain-specific data, existing methods employ autoregressive models to generate motion recursively, which suffer from inefficiency and error accumulation. We propose Unified Motion Flow (UMF), which consists of Pyramid Motion Flow (P-Flow) and Semi-Noise Motion Flow (S-Flow). UMF decomposes the number-free motion generation into a single-pass motion prior generation stage and multi-pass reaction generation stages. Specifically, UMF utilizes a unified latent space to bridge the distribution gap between heterogeneous motion datasets, enabling effective unified training. For motion prior generation, P-Flow operates on hierarchical resolutions conditioned on different noise levels, thereby mitigating computational overheads. For reaction generation, S-Flow learns a joint probabilistic path that adaptively performs reaction transformation and context reconstruction, alleviating error accumulation. Extensive results and user studies demonstrate UMF' s effectiveness as a generalist model for multi-person motion generation from text. Project page: https://githubhgh.github.io/umf/.