SafeFlow: Real-Time Text-Driven Humanoid Whole-Body Control via Physics-Guided Rectified Flow and Selective Safety Gating

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing text-driven humanoid motion generation methods often produce infeasible or unsafe trajectories due to neglecting physical constraints, particularly when handling out-of-distribution instructions. This work proposes SafeFlow, a framework that leverages physics-guided rectified flow to generate feasible motions within the VAE latent space, augmented with a three-stage safety gate mechanism for hierarchical risk control. The gates sequentially filter semantically anomalous commands (using Mahalanobis distance), dynamically unstable generations (via a direction-sensitive discrepancy metric), and trajectories violating hard joint or velocity limits. Evaluated on the Unitree G1 platform, SafeFlow significantly outperforms existing diffusion-based approaches, achieving higher success rates, improved physical plausibility, and faster inference while preserving motion diversity.

Technology Category

Application Category

📝 Abstract

Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.

Problem

Research questions and friction points this paper is trying to address.

text-driven motion generation

physical feasibility

safety

out-of-distribution prompts

humanoid control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-Guided Rectified Flow

3-Stage Safety Gate

Real-Time Whole-Body Control