PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

To address the low sample efficiency and partial observability challenges in whole-body control (WBC) for humanoid robots, this paper proposes a proprioception–privileged-state complementary contrastive representation learning framework. Our method jointly models proprioceptive signals and privileged states—without manual data augmentation—to learn compact, task-relevant latent representations. We introduce the novel paradigm of “proprioception-privileged contrastive learning” and present SRL4Humanoid, the first modular, unified evaluation framework for state representation learning tailored to humanoid robotics. Experiments on the LimX Oli platform demonstrate that our approach accelerates convergence by 2.3× and improves task success rates by 37% in velocity tracking and motion imitation tasks, while significantly enhancing robustness and sample efficiency under dynamic environmental conditions.

Technology Category

Application Category

📝 Abstract

Achieving efficient and robust whole-body control (WBC) is essential for enabling humanoid robots to perform complex tasks in dynamic environments. Despite the success of reinforcement learning (RL) in this domain, its sample inefficiency remains a significant challenge due to the intricate dynamics and partial observability of humanoid robots. To address this limitation, we propose PvP, a Proprioceptive-Privileged contrastive learning framework that leverages the intrinsic complementarity between proprioceptive and privileged states. PvP learns compact and task-relevant latent representations without requiring hand-crafted data augmentations, enabling faster and more stable policy learning. To support systematic evaluation, we develop SRL4Humanoid, the first unified and modular framework that provides high-quality implementations of representative state representation learning (SRL) methods for humanoid robot learning. Extensive experiments on the LimX Oli robot across velocity tracking and motion imitation tasks demonstrate that PvP significantly improves sample efficiency and final performance compared to baseline SRL methods. Our study further provides practical insights into integrating SRL with RL for humanoid WBC, offering valuable guidance for data-efficient humanoid robot learning.

Problem

Research questions and friction points this paper is trying to address.

Improves sample efficiency in humanoid robot reinforcement learning

Learns compact task-relevant representations without manual data augmentation

Enables faster and more stable whole-body control policy learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proprioceptive-Privileged contrastive learning for compact representations

No hand-crafted data augmentations for stable policy learning

Unified modular framework for state representation learning evaluation

🔎 Similar Papers

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation