Balancing Both Behavioral Quality and Diversity in Unsupervised Skill Discovery

📅 2023-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised skill discovery faces a fundamental trade-off between behavioral quality and diversity—particularly challenging in high-dimensional robotic control domains with rich latent skill spaces. To address this, we propose Contrastive Multi-objective Skill Discovery (ComSD), the first framework to jointly optimize a contrastive learning–driven diversity reward and a particle-filter–based exploration reward, thereby establishing a dynamic multi-objective reward mechanism in a reward-free setting. ComSD integrates adaptive weight balancing and unsupervised representation learning to unify high-fidelity skill discrimination with progressive discovery of novel behaviors. Evaluated on 32 downstream tasks, ComSD achieves state-of-the-art performance, significantly enhancing both behavioral diversity and practical utility of multi-joint robots across hierarchical exploration regimes.
📝 Abstract
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised skill discovery seeks to dig out diverse and exploratory skills without extrinsic reward, with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced methods struggle to well balance behavioral exploration and diversity, particularly when the agent dynamics are complex and potential skills are hard to discern (e.g., robot behavior discovery). In this paper, we propose extbf{Co}ntrastive extbf{m}ulti-objective extbf{S}kill extbf{D}iscovery extbf{(ComSD)} which discovers exploratory and diverse behaviors through a novel intrinsic incentive, named contrastive multi-objective reward. It contains a novel diversity reward based on contrastive learning to effectively drive agents to discern existing skills, and a particle-based exploration reward to access and learn new behaviors. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed for diversity-exploration balance, which further improves behavioral quality. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for complex multi-joint robots, enabling state-of-the-art performance across 32 challenging downstream adaptation tasks, which recent advanced methods cannot. Codes will be opened after publication.
Problem

Research questions and friction points this paper is trying to address.

Balancing skill diversity and exploration
Enhancing unsupervised skill discovery
Improving adaptation to downstream tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive dynamic reward for diversity
Particle-based exploration for far-reaching states
Dynamic weighting balances exploration and diversity
🔎 Similar Papers
No similar papers found.
X
Xin Liu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Y
Yaran Chen
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics