Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenge of verification failures in loop invariant synthesis caused by local reasoning errors in large language models (LLMs). To this end, the authors propose LORIS, a novel framework that integrates formal verification of natural-language reasoning steps with feedback-driven iterative refinement. LORIS automatically translates LLM-generated natural language invariants into first-order logic and employs formal verification to detect logical inconsistencies, which are then used to generate targeted feedback for guiding the model to correct its reasoning trajectory. Experimental results demonstrate that LORIS achieves a 93.1% success rate on a benchmark of 460 C programs and exhibits strong robustness on 50 challenging programs involving nonlinear properties, substantially enhancing the reliability of LLM-based reasoning in program verification.

📝 Abstract

We propose a novel framework that provides constructive feedback to an LLM in the "guess-and-check" paradigm by formally verifying its own thinking process and detecting local reasoning errors. We apply this framework to the loop invariant synthesis problem. We prompt the model to produce a step-by-step natural language proof justifying its thinking process for the failed verification condition of its generated loop invariants. Then, we use an LLM to translate the reasoning steps into first-order logic implications, which can be checked automatically. An invalid implication pinpoints the exact logical flaw in the LLM's thinking process, which we then use to construct targeted feedback for refinement. We have implemented our approach in a tool called LORIS and evaluated it on a main benchmark suite of 460 C programs and an additional benchmark suite of 50 C programs each of which involves non-linear properties. On the main benchmark suite, LORIS solved 445 of the programs, and achieved an overall success rate of $93.1\%$. LORIS also demonstrates robustness on the challenging non-linear benchmark suite.

Problem

Research questions and friction points this paper is trying to address.

loop invariant synthesis

large language models

program verification

reasoning errors

formal verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

loop invariant synthesis

large language models

local reasoning errors