Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited proficiency in foundational symbolic reasoning tasks—including PDDL planning, first-order logic inference, context-free grammar parsing, causal reasoning, and system-of-equations solving. Method: We introduce a scalable, programmatically generated reinforcement learning environment grounded in three design principles: (i) broad, generalizable problem distributions; (ii) external formal verifiers for output validation; and (iii) continuous difficulty scaling. The environment supports infinite generation of novel, formally verifiable symbolic reasoning tasks and employs verifier-coupled, differentiable reward signals for automatic assessment of complex symbolic outputs, enabling zero-shot transfer evaluation. Contribution/Results: Experiments reveal severe zero-shot performance limitations of state-of-the-art LLMs, confirming the environment’s high challenge level. This work establishes the first systematic benchmark for symbolic reasoning, offering both a novel training paradigm and an evaluation platform to advance LLMs’ formal reasoning capabilities.

Technology Category

Application Category

📝 Abstract
We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.
Problem

Research questions and friction points this paper is trying to address.

Advancing foundational symbolic reasoning in Large Language Models
Generating verifiable problems across multiple formal reasoning domains
Providing scalable training instances with continuous difficulty control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable RL environment with verifiable rewards
Procedurally generates problems across formal domains
Uses external tools for verification and difficulty control
🔎 Similar Papers
No similar papers found.
V
Valentin Lacombe
Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 - CRIStAL, F-59000 Lille, France
V
Valentin Quesnel
Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 - CRIStAL, F-59000 Lille, France
Damien Sileo
Damien Sileo
Inria
Natural Language ProcessingReasoningDatasetsLLMsSynthetic data