🤖 AI Summary
This work investigates whether mutual information-based skill learning (MISL) can reproduce and surpass the performance of Wasserstein-based skill learning (e.g., METRA) without abandoning the mutual information principle. To this end, we propose Contrastive Successor Features (CSF), a unified framework that jointly optimizes mutual information maximization, contrastive representation learning, and successor feature prediction within a lightweight architecture. Theoretically, we establish formal connections among skill discovery, contrastive representation learning, and successor representations. Empirically, CSF matches or exceeds METRA in both policy diversity and zero-shot transfer performance, while using fewer parameters and exhibiting greater training stability. Comprehensive ablation studies identify critical design choices for effective skill learning, clarifying the role of each component. Overall, CSF introduces a novel, representation-driven paradigm for mutual information-based skill learning, advancing both theoretical understanding and practical efficacy in unsupervised skill discovery.
📝 Abstract
Self-supervised learning has the potential of lifting several of the key challenges in reinforcement learning today, such as exploration, representation learning, and reward design. Recent work (METRA) has effectively argued that moving away from mutual information and instead optimizing a certain Wasserstein distance is important for good performance. In this paper, we argue that the benefits seen in that paper can largely be explained within the existing framework of mutual information skill learning (MISL). Our analysis suggests a new MISL method (contrastive successor features) that retains the excellent performance of METRA with fewer moving parts, and highlights connections between skill learning, contrastive representation learning, and successor features. Finally, through careful ablation studies, we provide further insight into some of the key ingredients for both our method and METRA.