GEM: A Gym for Agentic LLMs

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Large language models (LLMs) lack standardized interactive environments, hindering embodied, experience-driven learning. Method: We propose GEM—the first unified environment simulation and training platform for agent-oriented LLMs—inspired by OpenAI Gym. GEM defines a standardized environment-agent interface, supports high-throughput asynchronous vectorized execution, modular environment wrappers, and plug-and-play integration of multiple RL algorithms (PPO, GRPO, REINFORCE). It introduces Return Batch Normalization to improve credit assignment under dense rewards and establishes a fair benchmarking protocol. Contribution/Results: We conduct systematic baseline evaluations across 24 diverse environments, enabling, for the first time, comparable empirical validation of mainstream policy gradient methods in both single- and multi-turn settings. GEM significantly advances LLMs from static pretraining toward embodied interactive learning.

Technology Category

Application Category

📝 Abstract

The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills via interacting with complex environments. To facilitate this transition we introduce GEM (General Experience Maker), an open-source environment simulator designed for the age of LLMs. Analogous to OpenAI-Gym for traditional reinforcement learning (RL), GEM provides a standardized framework for the environment-agent interface, including asynchronous vectorized execution for high throughput, and flexible wrappers for easy extensibility. GEM also features a diverse suite of environments, robust integrated tools, and single-file example scripts demonstrating using GEM with five popular RL training frameworks. Along with this, we also provide a set of baselines across 24 environments using REINFORCE with Return Batch Normalization (ReBN), which -- unlike GRPO -- is compatible with the full RL setting of dense per-turn rewards and offers better credit assignment. We further conduct apple-to-apple benchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settings using GEM to shed light on the algorithmic designs. Lastly, GEM also functions as a convenient evaluation toolkit besides a training environment. We hope this framework can help accelerate future agentic LLM research.

Problem

Research questions and friction points this paper is trying to address.

Providing standardized environment simulator for agentic LLM training

Enabling experience-based learning through environment-agent interactions

Facilitating benchmarking of reinforcement learning algorithms for LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source environment simulator for agentic LLMs

Standardized framework with vectorized execution and wrappers

Compatible with multiple RL frameworks and evaluation toolkit

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation