🤖 AI Summary
To address the low safety, poor interpretability, and deployment difficulty of unsupervised skill discovery (USD) in real-world robotics, this paper proposes a modular framework based on state-space factorized disentanglement. Methodologically, it incorporates symmetry-inductive biases and style-factor regularization, jointly optimizing user-defined state factorization with multiple intrinsic rewards to learn morphology-aware, safety-constrained, structured, and interpretable skill representations. The key contribution is the first integration of symmetry priors and explicit style disentanglement into USD, enabling zero-shot cross-domain transfer. Experiments demonstrate that the learned skills exhibit clear semantic meaning in simulation and successfully transfer—without fine-tuning—to a physical quadruped robot. In downstream tasks, their performance matches that of expert policies trained with hand-engineered reward functions.
📝 Abstract
Unsupervised Skill Discovery (USD) allows agents to autonomously learn diverse behaviors without task-specific rewards. While recent USD methods have shown promise, their application to real-world robotics remains underexplored. In this paper, we propose a modular USD framework to address the challenges in the safety, interpretability, and deployability of the learned skills. Our approach employs user-defined factorization of the state space to learn disentangled skill representations. It assigns different skill discovery algorithms to each factor based on the desired intrinsic reward function. To encourage structured morphology-aware skills, we introduce symmetry-based inductive biases tailored to individual factors. We also incorporate a style factor and regularization penalties to promote safe and robust behaviors. We evaluate our framework in simulation using a quadrupedal robot and demonstrate zero-shot transfer of the learned skills to real hardware. Our results show that factorization and symmetry lead to the discovery of structured human-interpretable behaviors, while the style factor and penalties enhance safety and diversity. Additionally, we show that the learned skills can be used for downstream tasks and perform on par with oracle policies trained with hand-crafted rewards.