Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

In vision-based motor policy learning, high-dimensional visual inputs and agile control outputs lead to poor sample efficiency and large simulation-to-reality (Sim2Real) gaps. To address these challenges, this paper proposes a two-stage decoupled framework: first, a temporally aware, compact visual representation is learned via masked Transformer architectures and temporal contrastive learning; second, an oracle teacher policy—privileged with full state access—provides hierarchical supervision and progressive guidance for efficient knowledge transfer. The key innovation lies in the synergistic integration of oracle-guided learning with masked contrastive representation learning, substantially enhancing representation discriminability and policy generalization without increasing reliance on real-world data. Experiments demonstrate that our method achieves superior sample efficiency and asymptotic performance on both simulated and real robotic platforms, particularly excelling in perceptually complex and multi-task scenarios, where it consistently outperforms existing baselines.

Technology Category

Application Category

📝 Abstract

A prevailing approach for learning visuomotor policies is to employ reinforcement learning to map high-dimensional visual observations directly to action commands. However, the combination of high-dimensional visual inputs and agile maneuver outputs leads to long-standing challenges, including low sample efficiency and significant sim-to-real gaps. To address these issues, we propose Oracle-Guided Masked Contrastive Reinforcement Learning (OMC-RL), a novel framework designed to improve the sample efficiency and asymptotic performance of visuomotor policy learning. OMC-RL explicitly decouples the learning process into two stages: an upstream representation learning stage and a downstream policy learning stage. In the upstream stage, a masked Transformer module is trained with temporal modeling and contrastive learning to extract temporally-aware and task-relevant representations from sequential visual inputs. After training, the learned encoder is frozen and used to extract visual representations from consecutive frames, while the Transformer module is discarded. In the downstream stage, an oracle teacher policy with privileged access to global state information supervises the agent during early training to provide informative guidance and accelerate early policy learning. This guidance is gradually reduced to allow independent exploration as training progresses. Extensive experiments in simulated and real-world environments demonstrate that OMC-RL achieves superior sample efficiency and asymptotic policy performance, while also improving generalization across diverse and perceptually complex scenarios.

Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in visuomotor policy learning

Reducing sim-to-real gaps for visual control policies

Decoupling representation and policy learning for agility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage learning decouples representation and policy training

Masked Transformer extracts temporal-aware visual representations

Oracle teacher policy provides privileged guidance for training

🔎 Similar Papers

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

2023-06-06International Conference on Learning RepresentationsCitations: 4

Bosch Group

Renningen, BW, DE

Master Thesis Reinforcement Learning for Behavior Planning in Automated Driving

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)