Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the common failure of large language models in mathematical reasoning due to misapplying lemma conclusions without verifying their premises. The authors frame lemma usage as a structured prediction task and propose a two-stage output architecture—first validating premises and then assessing conclusion applicability—augmented with a segment-aware reinforcement learning mechanism. By integrating loss masking and joint training on both natural language and formal proofs, the approach precisely identifies the source of reasoning errors. Evaluated on in-domain tasks and premise-perturbed scenarios, the method significantly outperforms baseline models, while achieving comparable or slightly improved performance in end-to-end mathematical reasoning. Ablation studies confirm the necessity of each proposed component.

Technology Category

Application Category

📝 Abstract

Recent large language models (LLMs) perform strongly on mathematical benchmarks yet often misapply lemmas, importing conclusions without validating assumptions. We formalize lemma$-$judging as a structured prediction task: given a statement and a candidate lemma, the model must output a precondition check and a conclusion$-$utility check, from which a usefulness decision is derived. We present RULES, which encodes this specification via a two$-$section output and trains with reinforcement learning plus section$-$aware loss masking to assign penalty to the section responsible for errors. Training and evaluation draw on diverse natural language and formal proof corpora; robustness is assessed with a held$-$out perturbation suite; and end$-$to$-$end evaluation spans competition$-$style, perturbation$-$aligned, and theorem$-$based problems across various LLMs. Results show consistent in$-$domain gains over both a vanilla model and a single$-$label RL baseline, larger improvements on applicability$-$breaking perturbations, and parity or modest gains on end$-$to$-$end tasks; ablations indicate that the two$-$section outputs and section$-$aware reinforcement are both necessary for robustness.

Problem

Research questions and friction points this paper is trying to address.

lemma misuse

precondition validation

mathematical reasoning

large language models

reliable inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

lemma reasoning

structured prediction

section-aware reinforcement learning