REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Reinforcement learning (RL) for reasoning tasks suffers from the absence of verifiable reward signals and dynamically adjustable difficulty evaluation environments. Method: This paper introduces Reasoning Gym—a verifiable RL environment library supporting eight reasoning domains (e.g., algebra, logic, geometry). Its core innovation is a novel procedural generation framework enabling infinite, fine-grained control over task complexity for both training and evaluation, coupled with formal verifiers to guarantee reward correctness. The environment strictly adheres to the OpenAI Gym API specification. Contribution/Results: Experiments demonstrate significant improvements in cross-difficulty generalization, continual learning, and reward alignment. Reasoning Gym overcomes fundamental limitations of static datasets—namely, the inability to support continuous difficulty evolution and dynamic, adaptive assessment—thereby enabling rigorous, scalable, and trustworthy RL-based reasoning research.

Technology Category

Application Category

📝 Abstract

We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and various common games. Its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed. This procedural generation approach allows for continuous evaluation across varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models.

Problem

Research questions and friction points this paper is trying to address.

Provides diverse reasoning environments for reinforcement learning

Generates infinite adjustable-complexity training data procedurally

Enables continuous evaluation across difficulty levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Library of reasoning environments for RL

Procedural generation of infinite training data

Adjustable complexity for continuous evaluation

🔎 Similar Papers

No similar papers found.