Training LLMs with LogicReward for Faithful and Rigorous Reasoning

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Current LLM inference training relies solely on final-result feedback, failing to ensure logical rigor in intermediate reasoning steps—severely limiting deployment in high-stakes domains. To address this, we propose LogicReward: the first step-level, theorem-prover–based (Lean/Isabelle) verifiable logical reward framework. It integrates soft unification–guided autoformalization to achieve high-fidelity mapping from natural-language reasoning steps to formal logic—without requiring ground-truth formal labels. LogicReward directly injects logical consistency into the inference process via reinforcement learning. Experiments demonstrate that an 8B-parameter model trained with LogicReward significantly outperforms GPT-4o (+11.6%) and o4-mini (+2%) on both natural-language and formal-logic reasoning tasks. Moreover, it exhibits substantially improved generalization across mathematical and commonsense reasoning, as well as enhanced robustness of the reward signal.

Technology Category

Application Category

📝 Abstract

Although LLMs exhibit strong reasoning capabilities, existing training methods largely depend on outcome-based feedback, which can produce correct answers with flawed reasoning. Prior work introduces supervision on intermediate steps but still lacks guarantees of logical soundness, which is crucial in high-stakes scenarios where logical consistency is paramount. To address this, we propose LogicReward, a novel reward system that guides model training by enforcing step-level logical correctness with a theorem prover. We further introduce Autoformalization with Soft Unification, which reduces natural language ambiguity and improves formalization quality, enabling more effective use of the theorem prover. An 8B model trained on data constructed with LogicReward surpasses GPT-4o and o4-mini by 11.6% and 2% on natural language inference and logical reasoning tasks with simple training procedures. Further analysis shows that LogicReward enhances reasoning faithfulness, improves generalizability to unseen tasks such as math and commonsense reasoning, and provides a reliable reward signal even without ground-truth labels. We will release all data and code at https://llm-symbol.github.io/LogicReward.

Problem

Research questions and friction points this paper is trying to address.

Enhances logical correctness in LLM reasoning steps

Reduces ambiguity in natural language formalization

Improves reasoning faithfulness and generalizability to unseen tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

LogicReward uses theorem prover for step-level logical correctness

Autoformalization with Soft Unification reduces natural language ambiguity

Training enhances faithfulness and generalizability without ground-truth labels

🔎 Similar Papers

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models