🤖 AI Summary
To address the challenge of unsupervised, real-time adaptation when robots encounter unseen environments during deployment, this paper proposes ROAM: a perception-value-driven framework for online behavior selection and dynamic modulation. ROAM operates without human intervention, autonomously selecting and modulating pretrained behaviors from a library in a single execution to achieve end-to-end, lifetime-limited online adaptation. Its core components include a behavior value estimation module, a meta-level behavior modulation network, and an online policy reweighting algorithm. Evaluated in simulation and on a real-world Go1 quadrupedal robot, ROAM successfully handles strong out-of-distribution tasks—such as navigating while wearing roller skates—with adaptation efficiency over twice that of prior methods. It is the first approach to enable unsupervised, composable, and real-time deployment driven by a reinforcement learning–based behavior library.
📝 Abstract
To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previouslylearned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing and adapting relevant behaviors on-the-fly.