High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the substantial sim-to-real gap and difficulty in calibrating unknown physical parameters (e.g., friction, inertia) for Boston Dynamics’ Spot robot, this paper proposes a distribution-discrepancy-driven automatic calibration framework. It quantifies the distributional shift between simulated and real locomotion data using Wasserstein distance and Maximum Mean Discrepancy (MMD), feeding these metrics as optimization objectives to CMA-ES for tuning unobservable simulation parameters. This is the first work to achieve end-to-end deployment of reinforcement learning policies on physical Spot hardware, supporting multi-gait locomotion—including aerial phases. Experiments demonstrate a motion control latency of 5.2 ms/step—three times faster than the factory controller—along with significantly improved robustness on slippery terrain, disturbance rejection, and agility. The complete codebase and training pipeline are open-sourced.

Technology Category

Application Category

📝 Abstract

This work presents an overview of the technical details behind a high performance reinforcement learning policy deployment with the Spot RL Researcher Development Kit for low level motor access on Boston Dynamics Spot. This represents the first public demonstration of an end to end end reinforcement learning policy deployed on Spot hardware with training code publicly available through Nvidia IsaacLab and deployment code available through Boston Dynamics. We utilize Wasserstein Distance and Maximum Mean Discrepancy to quantify the distributional dissimilarity of data collected on hardware and in simulation to measure our sim2real gap. We use these measures as a scoring function for the Covariance Matrix Adaptation Evolution Strategy to optimize simulated parameters that are unknown or difficult to measure from Spot. Our procedure for modeling and training produces high quality reinforcement learning policies capable of multiple gaits, including a flight phase. We deploy policies capable of over 5.2ms locomotion, more than triple Spots default controller maximum speed, robustness to slippery surfaces, disturbance rejection, and overall agility previously unseen on Spot. We detail our method and release our code to support future work on Spot with the low level API.

Problem

Research questions and friction points this paper is trying to address.

Optimizing simulation parameters for Spot using reinforcement learning

Measuring sim2real gap with Wasserstein Distance and MMD

Deploying high-speed, agile locomotion policies on Spot

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Wasserstein Distance for sim2real gap

Applies CMA-ES to optimize simulation parameters

Deploys RL policies for high-speed Spot locomotion

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations