Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of logical inconsistency and reward hacking in large language models during reasoning, which stems from inherent stochasticity and undermines the rigor of generated reasoning chains. To overcome the limitations of passive post-hoc verification, the authors propose a dynamic interleaved reasoning framework guided by formal logical validation. This approach embeds symbolic logic verification directly into the generation process, enabling real-time detection and correction of logical fallacies in intermediate reasoning steps. The framework employs a two-stage training strategy combining supervised fine-tuning and policy optimization to achieve synergistic alignment between formal verification and language generation. Evaluated across six benchmarks spanning mathematical, logical, and general reasoning tasks, the 7B and 14B variants of the model outperform current state-of-the-art methods by average margins of 10.4% and 14.2%, respectively.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) show remarkable capabilities, yet their stochastic next-token prediction creates logical inconsistencies and reward hacking that formal symbolic systems avoid. To bridge this gap, we introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process, providing real-time feedback to detect and rectify errors as they occur. Distinguished from previous neuro-symbolic methods limited by passive post-hoc validation, our approach actively penalizes intermediate fallacies during the reasoning chain. We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization. Extensive evaluation on six benchmarks spanning mathematical, logical, and general reasoning demonstrates that our 7B and 14B models outperform state-of-the-art baselines by average margins of 10.4% and 14.2%, respectively. These results validate that formal verification can serve as a scalable mechanism to significantly push the performance boundaries of advanced LLM reasoning.

Problem

Research questions and friction points this paper is trying to address.

logical inconsistency

reward hacking

formal logic verification

natural language reasoning

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

formal logic verification

neuro-symbolic reasoning

real-time error correction