SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address core challenges in real-world reinforcement learning (RL) for high-degree-of-freedom mobile manipulation robots—including low sample efficiency, unsafe exploration, and fragile sim-to-real transfer—this paper proposes a general implicit action space framework pretrained on low-fidelity simulation. Methodologically, it integrates unsupervised skill discovery for temporal abstraction and motion decoupling, incorporates safety-constrained modeling, and designs an off-policy RL algorithm tailored to the implicit action space, enabling zero-shot demonstration and pure online learning in the real world. Its key contributions are: (i) the first integration of low-fidelity simulation with implicit action spaces to establish a robust sim-to-real transfer mechanism; and (ii) elimination of reliance on expert demonstrations or handcrafted priors. Evaluated on dual-arm, contact-intensive whole-body manipulation tasks, the approach achieves state-of-the-art performance with less than one hour of real-world interaction.

Technology Category

Application Category

📝 Abstract
Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information, code, and videos at robo-rl.github.io
Problem

Research questions and friction points this paper is trying to address.

Mastering control of high-DoF robots for real-world tasks
Bridging sim-to-real gap in reinforcement learning for robotics
Enabling efficient real-world RL without demonstrations or priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrains latent action space with simulator
Uses unsupervised skill discovery for safety
Applies off-policy RL for real-world tasks
Jiaheng Hu
Jiaheng Hu
UT-Austin
Robot LearningReinforcement LearningRoboticsMobile Manipulation
P
Peter Stone
The University of Texas at Austin, Sony AI
R
Roberto Mart'in-Mart'in
The University of Texas at Austin, Amazon