Model-Based Testing of an Intermediate Verifier Using Executable Operational Semantics

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

Detecting subtle, specification-omitted bugs in Boogie—a widely used intermediate verification language—is challenging due to the incompleteness of existing formal models. Method: We propose BCC, a lightweight model-based testing technique grounded in executable operational semantics. BCC integrates the PLT Redex framework with a small, deterministic subset of Boogie’s operational semantics to automatically generate random programs; it then identifies bugs by comparing semantic simulation results against actual Boogie verification outcomes. Contribution/Results: BCC breaks from conventional reliance on full formal models by leveraging executable semantics to drive randomized testing—thereby efficiently exercising complex, non-canonical implementation paths in the toolchain. In evaluation, BCC generated 3 million test programs and uncovered completeness violations in 2% of them. These findings demonstrate BCC’s effectiveness and practicality for ensuring the reliability of verification tools themselves.

Technology Category

Application Category

📝 Abstract

Lightweight validation technique, such as those based on random testing, are sometimes practical alternatives to full formal verification -- providing valuable benefits, such as finding bugs, without requiring a disproportionate effort. In fact, they can be useful even for fully formally verified tools, by exercising the parts of a complex system that go beyond the reach of formal models. In this context, this paper introduces BCC: a model-based testing technique for the Boogie intermediate verifier. BCC combines the formalization of a small, deterministic subset of the Boogie language with the generative capabilities of the PLT Redex language engineering framework. Basically, BCC uses PLT Redex to generate random Boogie programs, and to execute them according to a formal operational semantics; then, it runs the same programs through the Boogie verifier. Any inconsistency between the two executions (in PLT Redex and with Boogie) may indicate a potential bug in Boogie's implementation. To understand whether BCC can be useful in practice, we used it to generate three million Boogie programs. These experiments found 2% of cases indicative of completeness failures (i.e., spurious verification failures) in Boogie's toolchain. These results indicate that lightweight analysis tools, such as those for model-based random testing, are also useful to test and validate formal verification tools such as Boogie.

Problem

Research questions and friction points this paper is trying to address.

Testing Boogie intermediate verifier using model-based random generation

Detecting inconsistencies between operational semantics and verifier execution

Identifying potential bugs in formally verified verification tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based testing using executable operational semantics

Random Boogie program generation via PLT Redex

Cross-validation between formal semantics and verifier

🔎 Similar Papers

Accurate and Extensible Symbolic Execution of Binary Code based on Formal ISA Semantics