🤖 AI Summary
Existing diffusion-based motion generation methods either lack physical plausibility or sacrifice controllability. This paper introduces Diffuse-CLoC—the first forward-looking control diffusion framework explicitly designed for physics simulation—enabling physically realistic, controllable, and long-horizon character motion synthesis via a single conditional diffusion model that jointly models state and action distributions. Its core innovation lies in tightly coupling state prediction and action generation within a unified diffusion process, intrinsically embedding physics-aware constraint modeling and inference-time motion guidance, thereby eliminating the need for external high-level planners to achieve multi-task, long-range planning. Experiments demonstrate that Diffuse-CLoC significantly outperforms hierarchical diffusion-plus-tracking baselines on extended-duration tasks, including static/dynamic obstacle avoidance, motion interpolation, and task-space control.
📝 Abstract
We present Diffuse-CLoC, a guided diffusion framework for physics-based look-ahead control that enables intuitive, steerable, and physically realistic motion generation. While existing kinematics motion generation with diffusion models offer intuitive steering capabilities with inference-time conditioning, they often fail to produce physically viable motions. In contrast, recent diffusion-based control policies have shown promise in generating physically realizable motion sequences, but the lack of kinematics prediction limits their steerability. Diffuse-CLoC addresses these challenges through a key insight: modeling the joint distribution of states and actions within a single diffusion model makes action generation steerable by conditioning it on the predicted states. This approach allows us to leverage established conditioning techniques from kinematic motion generation while producing physically realistic motions. As a result, we achieve planning capabilities without the need for a high-level planner. Our method handles a diverse set of unseen long-horizon downstream tasks through a single pre-trained model, including static and dynamic obstacle avoidance, motion in-betweening, and task-space control. Experimental results show that our method significantly outperforms the traditional hierarchical framework of high-level motion diffusion and low-level tracking.