🤖 AI Summary
Reinforcement learning (RL) for reasoning tasks suffers from the absence of verifiable reward signals and dynamically adjustable difficulty evaluation environments. Method: This paper introduces Reasoning Gym—a verifiable RL environment library supporting eight reasoning domains (e.g., algebra, logic, geometry). Its core innovation is a novel procedural generation framework enabling infinite, fine-grained control over task complexity for both training and evaluation, coupled with formal verifiers to guarantee reward correctness. The environment strictly adheres to the OpenAI Gym API specification. Contribution/Results: Experiments demonstrate significant improvements in cross-difficulty generalization, continual learning, and reward alignment. Reasoning Gym overcomes fundamental limitations of static datasets—namely, the inability to support continuous difficulty evolution and dynamic, adaptive assessment—thereby enabling rigorous, scalable, and trustworthy RL-based reasoning research.
📝 Abstract
We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and various common games. Its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed. This procedural generation approach allows for continuous evaluation across varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models.