End-to-End Humanoid Robot Safe and Comfortable Locomotion Policy

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of safe, comfortable, and environment-aware navigation for humanoid robots in unstructured, human-centered environments, this paper proposes an end-to-end LiDAR point cloud–driven motion policy. Methodologically, it integrates spatiotemporal point cloud encoding, model-free Penalized Proximal Policy Optimization (P3O), and Constrained Markov Decision Processes (CMDPs). Crucially, it is the first to explicitly model Control Barrier Functions (CBFs) as differentiable constraint costs within the CMDP framework and incorporates a comfort reward mechanism grounded in human–robot interaction research. This enables joint optimization of safety, task performance, and socially compliant behavior. Experimental validation—from simulation to real-world deployment—demonstrates that the robot achieves agile, collision-free navigation around static and dynamic 3D obstacles while maintaining smooth, human-preferred motion trajectories.

Technology Category

Application Category

📝 Abstract
The deployment of humanoid robots in unstructured, human-centric environments requires navigation capabilities that extend beyond simple locomotion to include robust perception, provable safety, and socially aware behavior. Current reinforcement learning approaches are often limited by blind controllers that lack environmental awareness or by vision-based systems that fail to perceive complex 3D obstacles. In this work, we present an end-to-end locomotion policy that directly maps raw, spatio-temporal LiDAR point clouds to motor commands, enabling robust navigation in cluttered dynamic scenes. We formulate the control problem as a Constrained Markov Decision Process (CMDP) to formally separate safety from task objectives. Our key contribution is a novel methodology that translates the principles of Control Barrier Functions (CBFs) into costs within the CMDP, allowing a model-free Penalized Proximal Policy Optimization (P3O) to enforce safety constraints during training. Furthermore, we introduce a set of comfort-oriented rewards, grounded in human-robot interaction research, to promote motions that are smooth, predictable, and less intrusive. We demonstrate the efficacy of our framework through a successful sim-to-real transfer to a physical humanoid robot, which exhibits agile and safe navigation around both static and dynamic 3D obstacles.
Problem

Research questions and friction points this paper is trying to address.

Develop safe locomotion for humanoid robots in cluttered environments
Integrate LiDAR perception with motor control for obstacle avoidance
Ensure socially comfortable robot motions using human-inspired rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end LiDAR point cloud to motor commands
CMDP with Control Barrier Functions for safety
Comfort rewards for smooth human-like motion
🔎 Similar Papers
No similar papers found.