Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the limitations of existing programmatic reasoning environments—namely, their lack of multilingual support, verifiability, and scalability—which hinder research and evaluation of multilingual reasoning models. The authors present the first large-scale, cross-lingual parallel programmatic reasoning environment, generating structurally consistent and linguistically natural reasoning problems across 14 languages based on 94 task templates. Native speaker validation ensures linguistic authenticity. The environment supports unlimited-scale data generation, flexible difficulty control, and a verifiable reward mechanism, making it suitable for both training and evaluating reinforcement learning agents. The project has been open-sourced to provide a high-quality benchmark for multilingual reasoning research.

Technology Category

Application Category

📝 Abstract

We present the Multilingual Reasoning Gym, an extension of Reasoning Gym (Stojanovski et al., 2025), that procedurally generates verifiable reasoning problems across 14 languages. We translate templates for 94 tasks with native-speaker validation in 10 languages and targeted code or template adaptations to ensure linguistic naturalness. The Multilingual Reasoning Gym preserves the core benefits of the procedural generation approach used in the original Reasoning Gym, such as virtually unlimited problem instance generation and adjustable difficulty, and remains directly usable for Reinforcement Learning from Verifiable Rewards and evaluation settings. Problems in the Multilingual Reasoning Gym are parallel across languages, enabling crosslingually parallel data generation at massive scale due to the procedural nature of the environments. We release our implementation to support research into multilingual reasoning models.

Problem

Research questions and friction points this paper is trying to address.

multilingual reasoning

procedural generation

crosslingual evaluation

reasoning environments

verifiable reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual reasoning

procedural generation

crosslingual parallel data