PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Python lacks mature formal verification tools, unlike languages such as C, which benefit from efficient model checkers like CBMC. Method: This paper proposes a semantics-preserving high-level translation from Python to C leveraging large language models (LLMs), the first integration of LLMs into formal verification workflows. The approach overcomes key limitations of conventional translators—verbosity, low abstraction, and poor maintainability—and couples the translated C code with CBMC for bounded model checking and a MaxSAT solver for interpretable fault localization. Contribution/Results: Evaluated on two Python benchmark suites, the LLM-based translator achieves 80–90% translation accuracy. It enables assertion verification and precise error diagnosis for small yet nontrivial Python programs, substantially enhancing the feasibility and practicality of formal verification for Python.

Technology Category

Application Category

📝 Abstract

Python has become the dominant language for general-purpose programming, yet it lacks robust tools for formal verification. In contrast, programmers working in languages such as C benefit from mature model checkers, for example CBMC, which enable exhaustive symbolic reasoning and fault localisation. The inherent complexity of Python, coupled with the verbosity and low-level nature of existing transpilers (e.g., Cython), have historically limited the applicability of formal verification to Python programs. In this paper, we propose PyVeritas, a novel framework that leverages Large Language Models (LLMs) for high-level transpilation from Python to C, followed by bounded model checking and MaxSAT-based fault localisation in the generated C code. PyVeritas enables verification and bug localisation for Python code using existing model checking tools for C. Our empirical evaluation on two Python benchmarks demonstrates that LLM-based transpilation can achieve a high degree of accuracy, up to 80--90% for some LLMs, enabling effective development environment that supports assertion-based verification and interpretable fault diagnosis for small yet non-trivial Python programs.

Problem

Research questions and friction points this paper is trying to address.

Verifying Python programs lacks robust formal tools

Transpiling Python to C for model checking via LLMs

Enabling fault localization in Python using C verification tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based transpilation from Python to C

Bounded model checking on generated C code

MaxSAT-based fault localization in C

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?