π€ AI Summary
Existing physics-driven human motion generation methods rely heavily on physics simulators, resulting in high inference costs and poor parallelizability. Method: We propose SimDiffβthe first framework that directly embeds environmental physical parameters (e.g., gravity, wind force) into the denoising network of a diffusion model, and uniquely formulates simulator-based motion projection as a differentiable guidance signal within the diffusion process, supporting both classifier-free and classifier-guided sampling. Contribution/Results: This design eliminates simulator calls during inference, drastically improving efficiency while enabling fine-grained control over physical parameters and strong generalization to unseen physical environments. Experiments demonstrate that SimDiff generates high-fidelity, physically plausible human motion across diverse physical scenarios, achieves several-fold speedup in inference, and preserves both motion naturalness and dynamical consistency.
π Abstract
Generating physically plausible human motion is crucial for applications such as character animation and virtual reality. Existing approaches often incorporate a simulator-based motion projection layer to the diffusion process to enforce physical plausibility. However, such methods are computationally expensive due to the sequential nature of the simulator, which prevents parallelization. We show that simulator-based motion projection can be interpreted as a form of guidance, either classifier-based or classifier-free, within the diffusion process. Building on this insight, we propose SimDiff, a Simulator-constrained Diffusion Model that integrates environment parameters (e.g., gravity, wind) directly into the denoising process. By conditioning on these parameters, SimDiff generates physically plausible motions efficiently, without repeated simulator calls at inference, and also provides fine-grained control over different physical coefficients. Moreover, SimDiff successfully generalizes to unseen combinations of environmental parameters, demonstrating compositional generalization.