🤖 AI Summary
Unsupervised reinforcement learning in high-dimensional state spaces (e.g., images) suffers from insufficient skill diversity and poor exploration efficiency; existing mutual information (MI)-based approaches suffer from estimation bias and fail to support effective behavioral disentanglement. To address this, we propose a skill-region discriminative objective that bypasses MI estimation entirely and directly optimizes the separability of state distributions induced by distinct skills. We design a soft modular conditional autoencoder to model skill-specific latent-space densities and integrate a latent-space counting-based intrinsic reward to drive unsupervised skill discovery. Evaluated on both image-based and low-dimensional state-space tasks, our method learns semantically coherent, transferable, and diverse skills. In downstream task fine-tuning, it significantly outperforms established baselines—including entropy maximization and empowerment-driven methods—demonstrating superior skill utility and generalization.
📝 Abstract
Unsupervised Reinforcement Learning (RL) aims to discover diverse behaviors that can accelerate the learning of downstream tasks. Previous methods typically focus on entropy-based exploration or empowerment-driven skill learning. However, entropy-based exploration struggles in large-scale state spaces (e.g., images), and empowerment-based methods with Mutual Information (MI) estimations have limitations in state exploration. To address these challenges, we propose a novel skill discovery objective that maximizes the deviation of the state density of one skill from the explored regions of other skills, encouraging inter-skill state diversity similar to the initial MI objective. For state-density estimation, we construct a novel conditional autoencoder with soft modularization for different skill policies in high-dimensional space. Meanwhile, to incentivize intra-skill exploration, we formulate an intrinsic reward based on the learned autoencoder that resembles count-based exploration in a compact latent space. Through extensive experiments in challenging state and image-based tasks, we find our method learns meaningful skills and achieves superior performance in various downstream tasks.