🤖 AI Summary
This work systematically investigates the efficacy of key design choices in online reinforcement learning on real robots. Through 100 real-world training trials across three heterogeneous robotic platforms, combined with a standard reinforcement learning framework and systematic ablation studies, the study presents the first large-scale empirical analysis revealing hidden risks in commonly adopted default configurations. The research identifies critical factors spanning algorithmic settings, system implementation, and experimental protocols, and proposes a set of robust, easily deployable design guidelines. These guidelines substantially enhance training stability and sample efficiency, thereby significantly lowering the engineering barrier to applying online reinforcement learning across diverse tasks and hardware platforms.
📝 Abstract
We investigate what specific design choices enable successful online reinforcement learning (RL) on physical robots. Across 100 real-world training runs on three distinct robotic platforms, we systematically ablate algorithmic, systems, and experimental decisions that are typically left implicit in prior work. We find that some widely used defaults can be harmful, while a set of robust, readily adopted design choices within standard RL practice yield stable learning across tasks and hardware. These results provide the first large-sample empirical study of such design choices, enabling practitioners to deploy online RL with lower engineering effort.