Cross-domain Random Pre-training with Prototypes for Reinforcement Learning

📅 2023-02-11

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 1

career value

206K/year

🤖 AI Summary

To address the pretraining challenge in unsupervised cross-domain reinforcement learning for continuous visual control tasks, this paper proposes CRPTpro—a novel framework featuring a decoupled stochastic data collection mechanism to efficiently construct cross-domain pretraining datasets, coupled with prototype-driven self-supervised contrastive learning to train highly generalizable visual encoders. Crucially, CRPTpro enables zero-shot fine-tuning transfer to unseen downstream domains. Evaluated on 12 cross-domain tasks spanning eight continuous-control environments—including balance, locomotion, and manipulation—CRPTpro outperforms Proto-RL(C) on 11 of 12 benchmarks while reducing pretraining time to just 54.5% of Proto-RL(C)’s cost. This achieves a superior trade-off between efficiency and performance, establishing a new state-of-the-art in unsupervised cross-domain visual RL.

📝 Abstract

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised cross-domain Reinforcement Learning (RL) pre-training shows great potential for challenging continuous visual control but poses a big challenge. In this paper, we propose extbf{C}ross-domain extbf{R}andom extbf{P}re- extbf{T}raining with extbf{pro}totypes (CRPTpro), a novel, efficient, and effective self-supervised cross-domain RL pre-training framework. CRPTpro decouples data sampling from encoder pre-training, proposing decoupled random collection to easily and quickly generate a qualified cross-domain pre-training dataset. Moreover, a novel prototypical self-supervised algorithm is proposed to pre-train an effective visual encoder that is generic across different domains. Without finetuning, the cross-domain encoder can be implemented for challenging downstream tasks defined in different domains, either seen or unseen. Compared with recent advanced methods, CRPTpro achieves better performance on downstream policy learning without extra training on exploration agents for data collection, greatly reducing the burden of pre-training. We conduct extensive experiments across eight challenging continuous visual-control domains, including balance control, robot locomotion, and manipulation. CRPTpro significantly outperforms the next best Proto-RL(C) on 11/12 cross-domain downstream tasks with only 54.5% wall-clock pre-training time,footnote{Implementation: https://github.com/liuxin0824/CRPTpro} exhibiting state-of-the-art pre-training performance with greatly improved pre-training efficiency.

Problem

Research questions and friction points this paper is trying to address.

Cross-domain RL pre-training challenge

Efficient self-supervised visual encoder

Improved downstream task performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled random collection for datasets

Prototypical self-supervised algorithm for encoders

Efficient cross-domain pre-training framework

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Master Thesis Bridging the Gap between Reinforcement Learning & E2E Driving

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)