🤖 AI Summary
This work addresses the limitation of existing unsupervised skill discovery methods that overlook environmental geometric symmetries, leading to behavioral redundancy and poor sample efficiency. To overcome this, we propose a symmetry-aware skill discovery framework that explicitly embeds group structure into the objective, restricting optimization to a subspace of equivariant policies and group-invariant scoring functions. We introduce a group-invariant Wasserstein dependency measure to guarantee theoretical optimality. The scoring function is parameterized via group Fourier representations, and intrinsic rewards are defined through alignment of equivariant latent features, ensuring systematic generalization of skills under group transformations. Experiments demonstrate that our approach significantly improves state-space coverage and enhances downstream task learning efficiency on both state-based and pixel-level locomotion benchmarks.
📝 Abstract
Unsupervised skill discovery aims to acquire behavior primitives that improve exploration and accelerate downstream task learning. However, existing approaches often ignore the geometric symmetries of physical environments, leading to redundant behaviors and sample inefficiency. To address this, we introduce Group-Invariant Skill Discovery (GISD), a framework that explicitly embeds group structure into the skill discovery objective. Our approach is grounded in a theoretical guarantee: we prove that in group-symmetric environments, the standard Wasserstein dependency measure admits a globally optimal solution comprised of an equivariant policy and a group-invariant scoring function. Motivated by this, we formulate the Group-Invariant Wasserstein dependency measure, which restricts the optimization to this symmetry-aware subspace without loss of optimality. Practically, we parameterize the scoring function using a group Fourier representation and define the intrinsic reward via the alignment of equivariant latent features, ensuring that the discovered skills generalize systematically under group transformations. Experiments on state-based and pixel-based locomotion benchmarks demonstrate that GISD achieves broader state-space coverage and improved efficiency in downstream task learning compared to a strong baseline.