🤖 AI Summary
Python lacks mature formal verification tools, unlike languages such as C, which benefit from efficient model checkers like CBMC.
Method: This paper proposes a semantics-preserving high-level translation from Python to C leveraging large language models (LLMs), the first integration of LLMs into formal verification workflows. The approach overcomes key limitations of conventional translators—verbosity, low abstraction, and poor maintainability—and couples the translated C code with CBMC for bounded model checking and a MaxSAT solver for interpretable fault localization.
Contribution/Results: Evaluated on two Python benchmark suites, the LLM-based translator achieves 80–90% translation accuracy. It enables assertion verification and precise error diagnosis for small yet nontrivial Python programs, substantially enhancing the feasibility and practicality of formal verification for Python.
📝 Abstract
Python has become the dominant language for general-purpose programming, yet it lacks robust tools for formal verification. In contrast, programmers working in languages such as C benefit from mature model checkers, for example CBMC, which enable exhaustive symbolic reasoning and fault localisation. The inherent complexity of Python, coupled with the verbosity and low-level nature of existing transpilers (e.g., Cython), have historically limited the applicability of formal verification to Python programs.
In this paper, we propose PyVeritas, a novel framework that leverages Large Language Models (LLMs) for high-level transpilation from Python to C, followed by bounded model checking and MaxSAT-based fault localisation in the generated C code. PyVeritas enables verification and bug localisation for Python code using existing model checking tools for C. Our empirical evaluation on two Python benchmarks demonstrates that LLM-based transpilation can achieve a high degree of accuracy, up to 80--90% for some LLMs, enabling effective development environment that supports assertion-based verification and interpretable fault diagnosis for small yet non-trivial Python programs.