Unsupervised Skill Discovery through Skill Regions Differentiation

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Unsupervised reinforcement learning in high-dimensional state spaces (e.g., images) suffers from insufficient skill diversity and poor exploration efficiency; existing mutual information (MI)-based approaches suffer from estimation bias and fail to support effective behavioral disentanglement. To address this, we propose a skill-region discriminative objective that bypasses MI estimation entirely and directly optimizes the separability of state distributions induced by distinct skills. We design a soft modular conditional autoencoder to model skill-specific latent-space densities and integrate a latent-space counting-based intrinsic reward to drive unsupervised skill discovery. Evaluated on both image-based and low-dimensional state-space tasks, our method learns semantically coherent, transferable, and diverse skills. In downstream task fine-tuning, it significantly outperforms established baselines—including entropy maximization and empowerment-driven methods—demonstrating superior skill utility and generalization.

Technology Category

Application Category

📝 Abstract

Unsupervised Reinforcement Learning (RL) aims to discover diverse behaviors that can accelerate the learning of downstream tasks. Previous methods typically focus on entropy-based exploration or empowerment-driven skill learning. However, entropy-based exploration struggles in large-scale state spaces (e.g., images), and empowerment-based methods with Mutual Information (MI) estimations have limitations in state exploration. To address these challenges, we propose a novel skill discovery objective that maximizes the deviation of the state density of one skill from the explored regions of other skills, encouraging inter-skill state diversity similar to the initial MI objective. For state-density estimation, we construct a novel conditional autoencoder with soft modularization for different skill policies in high-dimensional space. Meanwhile, to incentivize intra-skill exploration, we formulate an intrinsic reward based on the learned autoencoder that resembles count-based exploration in a compact latent space. Through extensive experiments in challenging state and image-based tasks, we find our method learns meaningful skills and achieves superior performance in various downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of entropy-based exploration in large state spaces

Improves state exploration in mutual information-based skill learning

Enhances intra-skill exploration through compact latent space rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximizes state density deviation between skills

Uses conditional autoencoder with soft modularization

Formulates intrinsic reward for intra-skill exploration

🔎 Similar Papers

No similar papers found.