Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the common failure of large language models in mathematical reasoning due to misapplying lemma conclusions without verifying their premises. The authors frame lemma usage as a structured prediction task and propose a two-stage output architecture—first validating premises and then assessing conclusion applicability—augmented with a segment-aware reinforcement learning mechanism. By integrating loss masking and joint training on both natural language and formal proofs, the approach precisely identifies the source of reasoning errors. Evaluated on in-domain tasks and premise-perturbed scenarios, the method significantly outperforms baseline models, while achieving comparable or slightly improved performance in end-to-end mathematical reasoning. Ablation studies confirm the necessity of each proposed component.

Technology Category

Application Category

📝 Abstract
Recent large language models (LLMs) perform strongly on mathematical benchmarks yet often misapply lemmas, importing conclusions without validating assumptions. We formalize lemma$-$judging as a structured prediction task: given a statement and a candidate lemma, the model must output a precondition check and a conclusion$-$utility check, from which a usefulness decision is derived. We present RULES, which encodes this specification via a two$-$section output and trains with reinforcement learning plus section$-$aware loss masking to assign penalty to the section responsible for errors. Training and evaluation draw on diverse natural language and formal proof corpora; robustness is assessed with a held$-$out perturbation suite; and end$-$to$-$end evaluation spans competition$-$style, perturbation$-$aligned, and theorem$-$based problems across various LLMs. Results show consistent in$-$domain gains over both a vanilla model and a single$-$label RL baseline, larger improvements on applicability$-$breaking perturbations, and parity or modest gains on end$-$to$-$end tasks; ablations indicate that the two$-$section outputs and section$-$aware reinforcement are both necessary for robustness.
Problem

Research questions and friction points this paper is trying to address.

lemma misuse
precondition validation
mathematical reasoning
large language models
reliable inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

lemma reasoning
structured prediction
section-aware reinforcement learning
precondition validation
robustness evaluation
🔎 Similar Papers
No similar papers found.