🤖 AI Summary
Unsupervised skill discovery for high-degree-of-freedom agents faces dual challenges: exponential explosion of the exploration space and sparsity of semantic skill manifolds. To address this, we propose a semantics-guided latent space alignment framework that jointly leverages contrastive pretraining and directional clustering of reference trajectories on the unit hypersphere, yielding a semantically aligned skill representation space. This unified framework simultaneously enables skill discovery, imitation learning, and diverse behavior generation. Our method comprises four key components: (i) contrastive pretraining of observation-action representations; (ii) embedding of reference demonstration data; (iii) directional clustering in high-dimensional action space; and (iv) latent-variable policy control. Evaluated in a humanoid simulation environment with 359-dimensional observations and 69-dimensional actions, our approach autonomously discovers structured skills—including walking, running, and punching—without explicit supervision. In downstream imitation tasks, it significantly outperforms existing baselines.
📝 Abstract
Scaling unsupervised skill discovery algorithms to high-DoF agents remains challenging. As dimensionality increases, the exploration space grows exponentially, while the manifold of meaningful skills remains limited. Therefore, semantic meaningfulness becomes essential to effectively guide exploration in high-dimensional spaces. In this work, we present Reference-Grounded Skill Discovery (RGSD), a novel algorithm that grounds skill discovery in a semantically meaningful latent space using reference data. RGSD first performs contrastive pretraining to embed motions on a unit hypersphere, clustering each reference trajectory into a distinct direction. This grounding enables skill discovery to simultaneously involve both imitation of reference behaviors and the discovery of semantically related diverse behaviors. On a simulated SMPL humanoid with 359-D observations and 69-D actions, RGSD learns structured skills including walking, running, punching, and side stepping, and also discovers related novel behaviors. In downstream control tasks, RGSD outperforms imitation-based skill acquisition baselines. Our results suggest that lightweight reference-guided grounding offers a practical path to discovering semantically rich and structured skills in high-DoF systems.