🤖 AI Summary
To address the limited cross-environment generalization and online adaptation capabilities of generalist agents in multi-domain tasks, this paper proposes an extended In-Context Reinforcement Learning (ICRL) framework based on algorithmic distillation. Methodologically, we are the first to scale ICRL to non-toy, multi-domain control tasks, establishing the first cross-domain model with a fixed action space. Instead of conventional expert distillation, we employ algorithmic distillation for universal policy learning, integrated with policy-conditioned modeling, offline pretraining, and online fine-tuning. Experiments demonstrate that our model achieves strong cross-domain generalization and real-time adaptation across diverse heterogeneous control tasks, matching the performance of expert distillation approaches. This work establishes a novel paradigm for scalable, general-purpose decision-making systems.
📝 Abstract
In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems. Code to be released at https://github.com/dunnolab/vintix