Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) excel at logical reasoning tasks primarily through memorization of training data, lacking robust symbolic abstraction and generalization capabilities. To address this, we propose a symbol-guided Monte Carlo process supervision framework: (1) generating symbolic reasoning traces formalized in first-order logic (FOL); (2) automatically tuning a process reward model via Monte Carlo estimation to select high-quality traces; and (3) performing process-supervised fine-tuning (SFT). This is the first approach enabling scalable, verifiable neuro-symbolic reasoning. It significantly improves state-of-the-art and open-source models on benchmarks including FOLIO and LogicAsker, while enhancing cross-domain claim verification generalization. Crucially, it mitigates memory-driven reasoning biases by grounding inference in explicit, interpretable symbolic structures rather than statistical pattern matching.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have shown promising performance in mathematical and logical reasoning benchmarks. However, recent studies have pointed to memorization, rather than generalization, as one of the leading causes for such performance. LLMs, in fact, are susceptible to content variations, demonstrating a lack of robust symbolic abstractions supporting their reasoning process. To improve reliability, many attempts have been made to combine LLMs with symbolic methods. Nevertheless, existing approaches fail to effectively leverage symbolic representations due to the challenges involved in developing reliable and scalable verification mechanisms. In this paper, we propose to overcome such limitations by generating symbolic reasoning trajectories and select the high-quality ones using a process reward model automatically tuned based on Monte Carlo estimation. The trajectories are then employed via fine-tuning methods to improve logical reasoning and generalization. Our results on logical reasoning benchmarks such as FOLIO and LogicAsker show the effectiveness of the proposed method with large gains on frontier and open-weight models. Moreover, additional experiments on claim verification reveal that fine-tuning on the generated symbolic reasoning trajectories enhances out-of-domain generalizability, suggesting the potential impact of symbolically-guided process supervision in alleviating the effect of memorization on LLM reasoning.
Problem

Research questions and friction points this paper is trying to address.

Improving logical reasoning in LLMs via symbolic-guided supervision
Addressing memorization over generalization in LLM reasoning
Enhancing out-of-domain generalization using symbolic reasoning trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic reasoning trajectories generation
Monte Carlo-based process reward model
Fine-tuning with high-quality trajectories