LLM For Loop Invariant Generation and Fixing: How Far Are We?

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work presents the first systematic evaluation of large language models’ (LLMs) ability to infer and repair program loop invariants without auxiliary information. We adopt an empirical framework encompassing diverse open- and closed-source LLMs across multiple scales, integrating domain-knowledge augmentation and few-shot prompting to quantify performance on standard benchmarks for inductive invariant generation and logical defect repair. Results show that LLMs achieve up to 78% success in invariant generation but only 16% in invariant repair—revealing a critical bottleneck in deep logical correction. A key contribution is the identification of auxiliary information—particularly loop semantics prompts and correct examples—as decisive for improving repair accuracy. Our study establishes a reproducible evaluation paradigm for LLM-driven automated program safety analysis and provides concrete, actionable pathways for enhancing invariant repair capabilities.

Technology Category

Application Category

📝 Abstract

A loop invariant is a property of a loop that remains true before and after each execution of the loop. The identification of loop invariants is a critical step to support automated program safety assessment. Recent advancements in Large Language Models (LLMs) have demonstrated potential in diverse software engineering (SE) and formal verification tasks. However, we are not aware of the performance of LLMs to infer loop invariants. We report an empirical study of both open-source and closed-source LLMs of varying sizes to assess their proficiency in inferring inductive loop invariants for programs and in fixing incorrect invariants. Our findings reveal that while LLMs exhibit some utility in inferring and repairing loop invariants, their performance is substantially enhanced when supplemented with auxiliary information such as domain knowledge and illustrative examples. LLMs achieve a maximum success rate of 78% in generating, but are limited to 16% in repairing the invariant.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to generate and fix loop invariants for programs

Assessing performance of various LLMs on inductive loop invariant inference

Investigating how auxiliary information enhances LLMs' invariant generation and repair

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate loop invariants for program verification

LLMs repair incorrect invariants using auxiliary information

Domain knowledge enhances LLM performance in invariant tasks

🔎 Similar Papers

No similar papers found.