PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of dynamically balancing conflicting objectives—such as speed and energy consumption—in multi-objective control for humanoid robots, where conventional approaches rely on fixed weights and yield only a single suboptimal policy. To overcome this limitation, the authors propose a preference-conditioned control framework based on multi-objective reinforcement learning. By integrating a mixture-of-experts (MoE) architecture modulated by a preference vector and a Beta distribution alignment mechanism, the method enables a single policy to span diverse behaviors across the Pareto front. This eliminates the need to train multiple policies and allows real-time switching among behavior modes at runtime according to user-specified preferences, achieving flexible trade-offs between competing objectives in both simulation and real-world humanoid platforms.

Technology Category

Application Category

📝 Abstract
Humanoid robots often need to balance competing objectives, such as maximizing speed while minimizing energy consumption. While current reinforcement learning (RL) methods can master complex skills like fall recovery and perceptive locomotion, they are constrained by fixed weighting strategies that produce a single suboptimal policy, rather than providing a diverse set of solutions for sophisticated multi-objective control. In this paper, we propose a novel framework leveraging Multi-Objective Reinforcement Learning (MORL) to achieve Preference-Conditioned Humanoid Control (PCHC). Unlike conventional methods that require training a series of policies to approximate the Pareto front, our framework enables a single, preference-conditioned policy to exhibit a wide spectrum of diverse behaviors. To effectively integrate these requirements, we introduce a Beta distribution-based alignment mechanism based on preference vectors modulating a Mixture-of-Experts (MoE) module. We validated our approach on two representative humanoid tasks. Extensive simulations and real-world experiments demonstrate that the proposed framework allows the robot to adaptively shift its objective priorities in real-time based on the input preference condition.
Problem

Research questions and friction points this paper is trying to address.

Humanoid Control
Multi-Objective Reinforcement Learning
Preference Conditioning
Pareto Optimality
Behavior Diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Reinforcement Learning
Preference-Conditioned Control
Mixture-of-Experts
Humanoid Robotics
Pareto Front
🔎 Similar Papers
No similar papers found.
H
Huanyu Li
Harbin Institute of Technology
Dewei Wang
Dewei Wang
USTC
Robotics
X
Xinmiao Wang
Harbin Engineering University
Xinzhe Liu
Xinzhe Liu
Shanghaitech University
Robotics
P
Peng Liu
Harbin Institute of Technology
Chenjia Bai
Chenjia Bai
Institute of Artificial Intelligence, China Telecom(中国电信人工智能研究院, TeleAI)
Reinforcement LearningRoboticsEmbodied AI
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom