π€ AI Summary
This work addresses the automated derivation of weakest preconditions (WP)βthe largest set of initial states ensuring all terminating executions satisfy a given postcondition. We propose Fuzzing Guidance, a novel method that leverages large language models (LLMs) to generate candidate WPs and refines them iteratively using fuzzing-based execution feedback, program execution verification, and contextual refinement. Crucially, our approach is the first to deeply integrate empirical fuzzing feedback into an LLM-driven WP synthesis pipeline, enabling synergistic enhancement between symbolic reasoning and dynamic execution. Evaluated on a Java array program benchmark suite, our method achieves substantial improvements in both accuracy and practicality of generated WPs. It provides more reliable and scalable automation for formal program verification and runtime error detection.
π Abstract
The weakest precondition (WP) of a program describes the largest set of initial states from which all terminating executions of the program satisfy a given postcondition. The generation of WPs is an important task with practical applications in areas ranging from verification to run-time error checking.
This paper proposes the combination of Large Language Models (LLMs) and fuzz testing for generating WPs. In pursuit of this goal, we introduce Fuzzing Guidance (FG); FG acts as a means of directing LLMs towards correct WPs using program execution feedback. FG utilises fuzz testing for approximately checking the validity and weakness of candidate WPs, this information is then fed back to the LLM as a means of context refinement.
We demonstrate the effectiveness of our approach on a comprehensive benchmark set of deterministic array programs in Java. Our experiments indicate that LLMs are capable of producing viable candidate WPs, and that this ability can be practically enhanced through FG.