Cross-domain Random Pre-training with Prototypes for Reinforcement Learning

📅 2023-02-11
🏛️ arXiv.org
📈 Citations: 8
Influential: 1
📄 PDF

career value

202K/year
🤖 AI Summary
To address the pretraining challenge in unsupervised cross-domain reinforcement learning for continuous visual control tasks, this paper proposes CRPTpro—a novel framework featuring a decoupled stochastic data collection mechanism to efficiently construct cross-domain pretraining datasets, coupled with prototype-driven self-supervised contrastive learning to train highly generalizable visual encoders. Crucially, CRPTpro enables zero-shot fine-tuning transfer to unseen downstream domains. Evaluated on 12 cross-domain tasks spanning eight continuous-control environments—including balance, locomotion, and manipulation—CRPTpro outperforms Proto-RL(C) on 11 of 12 benchmarks while reducing pretraining time to just 54.5% of Proto-RL(C)’s cost. This achieves a superior trade-off between efficiency and performance, establishing a new state-of-the-art in unsupervised cross-domain visual RL.
📝 Abstract
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised cross-domain Reinforcement Learning (RL) pre-training shows great potential for challenging continuous visual control but poses a big challenge. In this paper, we propose extbf{C}ross-domain extbf{R}andom extbf{P}re- extbf{T}raining with extbf{pro}totypes (CRPTpro), a novel, efficient, and effective self-supervised cross-domain RL pre-training framework. CRPTpro decouples data sampling from encoder pre-training, proposing decoupled random collection to easily and quickly generate a qualified cross-domain pre-training dataset. Moreover, a novel prototypical self-supervised algorithm is proposed to pre-train an effective visual encoder that is generic across different domains. Without finetuning, the cross-domain encoder can be implemented for challenging downstream tasks defined in different domains, either seen or unseen. Compared with recent advanced methods, CRPTpro achieves better performance on downstream policy learning without extra training on exploration agents for data collection, greatly reducing the burden of pre-training. We conduct extensive experiments across eight challenging continuous visual-control domains, including balance control, robot locomotion, and manipulation. CRPTpro significantly outperforms the next best Proto-RL(C) on 11/12 cross-domain downstream tasks with only 54.5% wall-clock pre-training time,footnote{Implementation: https://github.com/liuxin0824/CRPTpro} exhibiting state-of-the-art pre-training performance with greatly improved pre-training efficiency.
Problem

Research questions and friction points this paper is trying to address.

Cross-domain RL pre-training challenge
Efficient self-supervised visual encoder
Improved downstream task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled random collection for datasets
Prototypical self-supervised algorithm for encoders
Efficient cross-domain pre-training framework
🔎 Similar Papers
No similar papers found.
X
Xin Liu
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Y
Yaran Chen
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
H
Haoran Li
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
B
Boyu Li
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics