NitroGen: An Open Foundation Model for Generalist Gaming Agents

📅 2026-01-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the challenge of building generalist agents capable of cross-game generalization. The authors introduce the first internet-scale dataset of gameplay videos paired with player actions, comprising over 40,000 hours of human gameplay, and leverage automated action annotation alongside large-scale behavioral cloning to train a unified vision-action foundation model. To systematically evaluate cross-game generalization, they propose a multi-game simulation benchmark and release both the model and dataset publicly. Experimental results demonstrate that the trained model significantly outperforms baselines trained from scratch on unseen games, achieving up to a 52% improvement in task success rates. The approach effectively generalizes across diverse game genres—including 3D action, 2D platformers, and procedurally generated worlds—advancing the development of general-purpose embodied agents.

Technology Category

Application Category

📝 Abstract

We introduce NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. We incorporate three key ingredients: 1) an internet-scale video-action dataset constructed by automatically extracting player actions from publicly available gameplay videos, 2) a multi-game benchmark environment that can measure cross-game generalization, and 3) a unified vision-action model trained with large-scale behavior cloning. NitroGen exhibits strong competence across diverse domains, including combat encounters in 3D action games, high-precision control in 2D platformers, and exploration in procedurally generated worlds. It transfers effectively to unseen games, achieving up to 52% relative improvement in task success rates over models trained from scratch. We release the dataset, evaluation suite, and model weights to advance research on generalist embodied agents.

Problem

Research questions and friction points this paper is trying to address.

generalist gaming agents

cross-game generalization

vision-action modeling

embodied AI

foundation model

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model

vision-action modeling

cross-game generalization