🤖 AI Summary
Existing text-driven humanoid motion generation methods often produce infeasible or unsafe trajectories due to neglecting physical constraints, particularly when handling out-of-distribution instructions. This work proposes SafeFlow, a framework that leverages physics-guided rectified flow to generate feasible motions within the VAE latent space, augmented with a three-stage safety gate mechanism for hierarchical risk control. The gates sequentially filter semantically anomalous commands (using Mahalanobis distance), dynamically unstable generations (via a direction-sensitive discrepancy metric), and trajectories violating hard joint or velocity limits. Evaluated on the Unitree G1 platform, SafeFlow significantly outperforms existing diffusion-based approaches, achieving higher success rates, improved physical plausibility, and faster inference while preserving motion diversity.
📝 Abstract
Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.