Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of low learning efficiency and reward gaming that commonly hinder existing unsupervised skill discovery methods from acquiring diverse locomotion skills. To overcome these limitations, the authors propose an unsupervised reinforcement learning framework that integrates an Orthogonal Mixture-of-Experts (OMoE) architecture with multi-discriminator adversarial training, enabling efficient skill discovery on the Unitree A1 quadruped robot. The OMoE structure effectively prevents behavioral representation collapse, while the multi-discriminator mechanism mitigates reward gaming by operating in distinct observation spaces, thereby substantially enhancing both skill diversity and training efficiency. Experimental results demonstrate that the proposed method improves state-space coverage by 18.3% over baseline approaches and successfully generates a wide range of efficient locomotion skills.

Technology Category

Application Category

📝 Abstract
Reinforcement learning necessitates meticulous reward shaping by specialists to elicit target behaviors, while imitation learning relies on costly task-specific data. In contrast, unsupervised skill discovery can potentially reduce these burdens by learning a diverse repertoire of useful skills driven by intrinsic motivation. However, existing methods exhibit two key limitations: they typically rely on a single policy to master a versatile repertoire of behaviors without modeling the shared structure or distinctions among them, which results in low learning efficiency; moreover, they are susceptible to reward hacking, where the reward signal increases and converges rapidly while the learned skills display insufficient actual diversity. In this work, we introduce an Orthogonal Mixture-of-Experts (OMoE) architecture that prevents diverse behaviors from collapsing into overlapping representations, enabling a single policy to master a wide spectrum of locomotion skills. In addition, we design a multi-discriminator framework in which different discriminators operate on distinct observation spaces, effectively mitigating reward hacking. We evaluated our method on the 12-DOF Unitree A1 quadruped robot, demonstrating a diverse set of locomotion skills. Our experiments demonstrate that the proposed framework boosts training efficiency and yields an 18.3\% expansion in state-space coverage compared to the baseline.
Problem

Research questions and friction points this paper is trying to address.

unsupervised skill discovery
reward hacking
locomotion skills
quadruped robots
learning efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Mixture-of-Experts
unsupervised skill discovery
multi-discriminator framework
reward hacking mitigation
quadruped locomotion
🔎 Similar Papers
No similar papers found.
R
Ruopeng Cui
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China
Y
Yifei Bi
College of Foreign Language, The University of Shanghai for Science and Technology, Shanghai, China
H
Haojie Luo
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China
Wei Li
Wei Li
Fudan University
RoboticsCollective IntelligenceMachine Learning