🤖 AI Summary
This work addresses the performance degradation caused by domain gaps in sim-to-real transfer and the safety risks and low sample efficiency associated with directly fine-tuning policies on real robots. To this end, the authors propose SLowRL, a novel framework that, for the first time, integrates low-rank adaptation (LoRA) with runtime safety constraints to enable efficient and safe fine-tuning of simulation-trained reinforcement learning policies on real quadrupedal robots. Using only rank-1 parameter updates, SLowRL successfully recovers simulated-level performance, demonstrating effective execution of hopping and trotting gaits on the Unitree Go2 platform. Compared to a standard PPO baseline, the method reduces fine-tuning time by 46.5% while achieving near-zero safety violations.
📝 Abstract
Sim-to-real transfer of locomotion policies often leads to performance degradation due to the inevitable sim-to-real gap. Naively fine-tuning these policies directly on hardware is problematic, as it poses risks of mechanical failure and suffers from high sample inefficiency. In this paper, we address the challenge of safely and efficiently fine-tuning reinforcement learning (RL) policies for dynamic locomotion tasks. Specifically, we focus on fine-tuning policies learned in simulation directly on hardware, while explicitly enforcing safety constraints. In doing so, we introduce SLowRL, a framework that combines Low-Rank Adaptation (LoRA) with training-time safety enforcement via a recovery policy. We evaluate our method both in simulation and on a real Unitree Go2 quadruped robot for jump and trot tasks. Experimental results show that our method achieves a $46.5\%$ reduction in fine-tuning time and near-zero safety violations compared to standard proximal policy optimization (PPO) baselines. Notably, we find that a rank-1 adaptation alone is sufficient to recover pre-trained performance in the real world, while maintaining stable and safe real-world fine-tuning. These results demonstrate the practicality of safe, efficient fine-tuning for dynamic real-world robotic applications.