Vintix: Action Model via In-Context Reinforcement Learning

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the limited cross-environment generalization and online adaptation capabilities of generalist agents in multi-domain tasks, this paper proposes an extended In-Context Reinforcement Learning (ICRL) framework based on algorithmic distillation. Methodologically, we are the first to scale ICRL to non-toy, multi-domain control tasks, establishing the first cross-domain model with a fixed action space. Instead of conventional expert distillation, we employ algorithmic distillation for universal policy learning, integrated with policy-conditioned modeling, offline pretraining, and online fine-tuning. Experiments demonstrate that our model achieves strong cross-domain generalization and real-time adaptation across diverse heterogeneous control tasks, matching the performance of expert distillation approaches. This work establishes a novel paradigm for scalable, general-purpose decision-making systems.

Technology Category

Application Category

📝 Abstract

In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems. Code to be released at https://github.com/dunnolab/vintix

Problem

Research questions and friction points this paper is trying to address.

Extended Contextual Reinforcement Learning

Complex Multi-domain Tasks

Adaptive Decision-making Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm Distillation

Contextual Reinforcement Learning

Complex Multi-domain Tasks

🔎 Similar Papers

Retrieval-Augmented Decision Transformer: External Memory for In-context RL