🤖 AI Summary
This paper addresses the generalization challenge under out-of-distribution (OOD) environments in robot control. We propose SPARC, a single-stage adaptive framework that operates without explicit test-time contextual information. Unlike conventional two-stage paradigms—where context encoding and historical adaptation are decoupled—SPARC jointly learns implicit context inference and end-to-end policy optimization for online adaptation. We evaluate SPARC on the high-fidelity Gran Turismo 7 racing simulator and a wind-disturbed MuJoCo locomotion benchmark. Using a single unified policy, SPARC successfully navigates 100 unseen dynamical configurations and vehicle setups—demonstrating strong zero-shot OOD robustness and generalization. Our approach advances contextual reinforcement learning by introducing a more concise, computationally efficient, and scalable paradigm that eliminates reliance on explicit context inputs or multi-stage adaptation pipelines.
📝 Abstract
Generalization to unseen environments is a significant challenge in the field of robotics and control. In this work, we focus on contextual reinforcement learning, where agents act within environments with varying contexts, such as self-driving cars or quadrupedal robots that need to operate in different terrains or weather conditions than they were trained for. We tackle the critical task of generalizing to out-of-distribution (OOD) settings, without access to explicit context information at test time. Recent work has addressed this problem by training a context encoder and a history adaptation module in separate stages. While promising, this two-phase approach is cumbersome to implement and train. We simplify the methodology and introduce SPARC: single-phase adaptation for robust control. We test SPARC on varying contexts within the high-fidelity racing simulator Gran Turismo 7 and wind-perturbed MuJoCo environments, and find that it achieves reliable and robust OOD generalization.