Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Humanoid robots must simultaneously achieve precise navigation command tracking and compliant interaction with external forces; however, existing reinforcement learning (RL) approaches prioritize robustness, yielding overly rigid responses and insufficient compliance. This paper proposes a preference-conditioned multi-objective RL framework that—novelty—introduces dynamic preference modulation into multi-objective RL, unifying rigid trajectory tracking and compliant behavior within a single policy. We explicitly model external force effects via a velocity-resistance factor and employ an encoder-decoder architecture to extract privileged features from lightweight observations, enabling end-to-end omnidirectional walking control. Evaluated in simulation and on a physical humanoid platform, our method significantly improves policy adaptability and training convergence speed, supports real-time behavioral preference switching, and achieves high-performance, deployable bipedal locomotion.

Technology Category

Application Category

📝 Abstract

Humanoid locomotion requires not only accurate command tracking for navigation but also compliant responses to external forces during human interaction. Despite significant progress, existing RL approaches mainly emphasize robustness, yielding policies that resist external forces but lack compliance-particularly challenging for inherently unstable humanoids. In this work, we address this by formulating humanoid locomotion as a multi-objective optimization problem that balances command tracking and external force compliance. We introduce a preference-conditioned multi-objective RL (MORL) framework that integrates rigid command following and compliant behaviors within a single omnidirectional locomotion policy. External forces are modeled via velocity-resistance factor for consistent reward design, and training leverages an encoder-decoder structure that infers task-relevant privileged features from deployable observations. We validate our approach in both simulation and real-world experiments on a humanoid robot. Experimental results indicate that our framework not only improves adaptability and convergence over standard pipelines, but also realizes deployable preference-conditioned humanoid locomotion.

Problem

Research questions and friction points this paper is trying to address.

Balancing command tracking with force compliance in humanoid locomotion

Integrating rigid and compliant behaviors in a single policy

Developing deployable preference-conditioned multi-objective RL framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective RL balances command tracking and force compliance

Velocity-resistance factor models external forces for reward design

Encoder-decoder structure infers privileged features from observations

🔎 Similar Papers

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning