XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Current in-context reinforcement learning (ICRL) lacks challenging, standardized benchmarks; existing studies are confined to simple environments and small-scale datasets, hindering progress. To address this, we introduce XLand-100B—the first large-scale, multi-task ICRL benchmark—built upon XLand-MiniGrid. It comprises nearly 30,000 distinct tasks, 100 billion state transitions, and 2.5 billion episodes, accompanied by full learning trajectories, task metadata encodings, and a standardized evaluation protocol. Our key contributions are the first demonstration of ICRL at unprecedented scale, high task diversity, and reproducible, data-scalable construction. Empirical evaluation reveals that state-of-the-art ICRL methods exhibit severe generalization failures on novel, complex tasks. The dataset is publicly released, substantially lowering barriers to large-scale ICRL research and fostering community-wide standardization and methodological advancement.

Technology Category

Application Category

📝 Abstract
Following the success of the in-context learning paradigm in large-scale language and computer vision models, the recently emerging field of in-context reinforcement learning is experiencing a rapid growth. However, its development has been held back by the lack of challenging benchmarks, as all the experiments have been carried out in simple environments and on small-scale datasets. We present XLand-100B, a large-scale dataset for in-context reinforcement learning based on the XLand-MiniGrid environment, as a first step to alleviate this problem. It contains complete learning histories for nearly $30,000$ different tasks, covering $100$B transitions and 2.5B episodes. It took 50,000 GPU hours to collect the dataset, which is beyond the reach of most academic labs. Along with the dataset, we provide the utilities to reproduce or expand it even further. We also benchmark common in-context RL baselines and show that they struggle to generalize to novel and diverse tasks. With this substantial effort, we aim to democratize research in the rapidly growing field of in-context reinforcement learning and provide a solid foundation for further scaling.
Problem

Research questions and friction points this paper is trying to address.

Lack of challenging benchmarks for in-context RL
Need for large-scale datasets in diverse tasks
Difficulty in generalizing RL to novel tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset XLand-100B
In-context reinforcement learning
50,000 GPU hours collection
🔎 Similar Papers
No similar papers found.