SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

To address core challenges in real-world reinforcement learning (RL) for high-degree-of-freedom mobile manipulation robots—including low sample efficiency, unsafe exploration, and fragile sim-to-real transfer—this paper proposes a general implicit action space framework pretrained on low-fidelity simulation. Methodologically, it integrates unsupervised skill discovery for temporal abstraction and motion decoupling, incorporates safety-constrained modeling, and designs an off-policy RL algorithm tailored to the implicit action space, enabling zero-shot demonstration and pure online learning in the real world. Its key contributions are: (i) the first integration of low-fidelity simulation with implicit action spaces to establish a robust sim-to-real transfer mechanism; and (ii) elimination of reliance on expert demonstrations or handcrafted priors. Evaluated on dual-arm, contact-intensive whole-body manipulation tasks, the approach achieves state-of-the-art performance with less than one hour of real-world interaction.

Technology Category

Application Category

📝 Abstract

Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information, code, and videos at robo-rl.github.io

Problem

Research questions and friction points this paper is trying to address.

Mastering control of high-DoF robots for real-world tasks

Bridging sim-to-real gap in reinforcement learning for robotics

Enabling efficient real-world RL without demonstrations or priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrains latent action space with simulator

Uses unsupervised skill discovery for safety

Applies off-policy RL for real-world tasks

🔎 Similar Papers

PWM: Policy Learning with Multi-Task World Models