🤖 AI Summary
Automatically generating verifiable Python formal specifications remains challenging, and developers often abandon automated verification tools due to the tediousness of manually writing contracts. This work proposes a closed-loop approach that integrates large language models with symbolic execution (CrossHair) to automatically generate and iteratively refine icontract-style contract annotations without modifying the original code. The method leverages feedback from symbolic execution to drive specification refinement and simultaneously produces coverage-guided pytest stubs and debugging artifacts. Experimental results demonstrate that the approach successfully generates CrossHair-compatible specifications for most programs, significantly enhancing the practical feasibility of automated verification, while also revealing real-world limitations arising from the boundaries of symbolic exploration and behavioral discrepancies in large language models.
📝 Abstract
Automatically generating formal specifications could reduce the effort needed to improve program correctness, but in practice, this is still challenging. Many developers avoid writing contracts by hand, which limits the use of automated verification tools. Recent large language models (LLMs) can generate specifications from code, but these specifications often fail in terms of verification. The reason is syntax errors, overly strict constraints, or mismatches with program behavior. We present SpecPylot, a Python tool that synthesizes executable specifications for Python programs as icontract annotations and checks them using crosshair's symbolic execution. The tool relies on LLMs to propose candidate contracts and uses crosshair to validate them. When crosshair finds a concrete counterexample, SpecPylot updates only the generated contracts and leaves the program itself untouched. In addition, the tool can produce coverage-driven pytest stubs and keep detailed execution artifacts that are useful during debugging. Overall, the evaluation indicates that SpecPylot is able to generate crosshair-compatible contracts for most programs, but it also highlights the practical limits introduced by bounded symbolic exploration and differences in LLM behavior.