Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the representation identifiability problem in mutual-information-based skill learning (MISL) for reinforcement learning. To remedy the lack of theoretical guarantees in existing approaches, we provide the first rigorous identifiability analysis of Contrastive Successor Features (CSF) within an RL framework. Our theory establishes a fundamental connection among policy diversity, inner-product parameterization of mutual information, and recovery of true environmental features. We prove that CSF—by jointly maximizing skill discriminability and minimizing conditional entropy—recovers the environment’s intrinsic successor features asymptotically. Moreover, the choice of entropy regularization and mutual information objective critically governs the degree of identifiability. Empirical evaluation on MuJoCo and DeepMind Control Suite demonstrates that CSF consistently reconstructs physical state features, validating both our theoretical predictions and practical efficacy.

Technology Category

Application Category

📝 Abstract
Self-supervised feature learning and pretraining methods in reinforcement learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL). These methods aim to learn a representation of the environment while also incentivizing exploration thereof. However, the role of the representation and mutual information parametrization in MISL is not yet well understood theoretically. Our work investigates MISL through the lens of identifiable representation learning by focusing on the Contrastive Successor Features (CSF) method. We prove that CSF can provably recover the environment's ground-truth features up to a linear transformation due to the inner product parametrization of the features and skill diversity in a discriminative sense. This first identifiability guarantee for representation learning in RL also helps explain the implications of different mutual information objectives and the downsides of entropy regularizers. We empirically validate our claims in MuJoCo and DeepMind Control and show how CSF provably recovers the ground-truth features both from states and pixels.
Problem

Research questions and friction points this paper is trying to address.

Understand role of representation in mutual information skill learning
Prove identifiability of ground-truth features in reinforcement learning
Validate feature recovery in MuJoCo and DeepMind Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Contrastive Successor Features method
Recovers ground-truth features linearly
Validated in MuJoCo and DeepMind
🔎 Similar Papers
No similar papers found.