REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning (RL) for reasoning tasks suffers from the absence of verifiable reward signals and dynamically adjustable difficulty evaluation environments. Method: This paper introduces Reasoning Gym—a verifiable RL environment library supporting eight reasoning domains (e.g., algebra, logic, geometry). Its core innovation is a novel procedural generation framework enabling infinite, fine-grained control over task complexity for both training and evaluation, coupled with formal verifiers to guarantee reward correctness. The environment strictly adheres to the OpenAI Gym API specification. Contribution/Results: Experiments demonstrate significant improvements in cross-difficulty generalization, continual learning, and reward alignment. Reasoning Gym overcomes fundamental limitations of static datasets—namely, the inability to support continuous difficulty evolution and dynamic, adaptive assessment—thereby enabling rigorous, scalable, and trustworthy RL-based reasoning research.

Technology Category

Application Category

📝 Abstract
We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and various common games. Its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed. This procedural generation approach allows for continuous evaluation across varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models.
Problem

Research questions and friction points this paper is trying to address.

Provides diverse reasoning environments for reinforcement learning
Generates infinite adjustable-complexity training data procedurally
Enables continuous evaluation across difficulty levels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Library of reasoning environments for RL
Procedural generation of infinite training data
Adjustable complexity for continuous evaluation
🔎 Similar Papers
No similar papers found.
Z
Zafir Stojanovski
O
Oliver Stanley
J
Joe Sharratt
Richard Jones
Richard Jones
A
A. Adefioye
Jean Kaddour
Jean Kaddour
University College London
LLMs
A
Andreas Kopf