Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots must simultaneously achieve precise navigation command tracking and compliant interaction with external forces; however, existing reinforcement learning (RL) approaches prioritize robustness, yielding overly rigid responses and insufficient compliance. This paper proposes a preference-conditioned multi-objective RL framework that—novelty—introduces dynamic preference modulation into multi-objective RL, unifying rigid trajectory tracking and compliant behavior within a single policy. We explicitly model external force effects via a velocity-resistance factor and employ an encoder-decoder architecture to extract privileged features from lightweight observations, enabling end-to-end omnidirectional walking control. Evaluated in simulation and on a physical humanoid platform, our method significantly improves policy adaptability and training convergence speed, supports real-time behavioral preference switching, and achieves high-performance, deployable bipedal locomotion.

Technology Category

Application Category

📝 Abstract
Humanoid locomotion requires not only accurate command tracking for navigation but also compliant responses to external forces during human interaction. Despite significant progress, existing RL approaches mainly emphasize robustness, yielding policies that resist external forces but lack compliance-particularly challenging for inherently unstable humanoids. In this work, we address this by formulating humanoid locomotion as a multi-objective optimization problem that balances command tracking and external force compliance. We introduce a preference-conditioned multi-objective RL (MORL) framework that integrates rigid command following and compliant behaviors within a single omnidirectional locomotion policy. External forces are modeled via velocity-resistance factor for consistent reward design, and training leverages an encoder-decoder structure that infers task-relevant privileged features from deployable observations. We validate our approach in both simulation and real-world experiments on a humanoid robot. Experimental results indicate that our framework not only improves adaptability and convergence over standard pipelines, but also realizes deployable preference-conditioned humanoid locomotion.
Problem

Research questions and friction points this paper is trying to address.

Balancing command tracking with force compliance in humanoid locomotion
Integrating rigid and compliant behaviors in a single policy
Developing deployable preference-conditioned multi-objective RL framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective RL balances command tracking and force compliance
Velocity-resistance factor models external forces for reward design
Encoder-decoder structure infers privileged features from observations
🔎 Similar Papers
No similar papers found.
T
Tingxuan Leng
Tsinghua University, Beijing 100084, China
Yushi Wang
Yushi Wang
Tsinghua University
Robotics
T
Tinglong Zheng
Beijing Jiaotong University, Beijing 100044, China
C
Changsheng Luo
Tsinghua University, Beijing 100084, China
Mingguo Zhao
Mingguo Zhao
Tsinghua University
Robotics