🤖 AI Summary
Detecting subtle, specification-omitted bugs in Boogie—a widely used intermediate verification language—is challenging due to the incompleteness of existing formal models.
Method: We propose BCC, a lightweight model-based testing technique grounded in executable operational semantics. BCC integrates the PLT Redex framework with a small, deterministic subset of Boogie’s operational semantics to automatically generate random programs; it then identifies bugs by comparing semantic simulation results against actual Boogie verification outcomes.
Contribution/Results: BCC breaks from conventional reliance on full formal models by leveraging executable semantics to drive randomized testing—thereby efficiently exercising complex, non-canonical implementation paths in the toolchain. In evaluation, BCC generated 3 million test programs and uncovered completeness violations in 2% of them. These findings demonstrate BCC’s effectiveness and practicality for ensuring the reliability of verification tools themselves.
📝 Abstract
Lightweight validation technique, such as those based on random testing, are sometimes practical alternatives to full formal verification -- providing valuable benefits, such as finding bugs, without requiring a disproportionate effort. In fact, they can be useful even for fully formally verified tools, by exercising the parts of a complex system that go beyond the reach of formal models.
In this context, this paper introduces BCC: a model-based testing technique for the Boogie intermediate verifier. BCC combines the formalization of a small, deterministic subset of the Boogie language with the generative capabilities of the PLT Redex language engineering framework. Basically, BCC uses PLT Redex to generate random Boogie programs, and to execute them according to a formal operational semantics; then, it runs the same programs through the Boogie verifier. Any inconsistency between the two executions (in PLT Redex and with Boogie) may indicate a potential bug in Boogie's implementation.
To understand whether BCC can be useful in practice, we used it to generate three million Boogie programs. These experiments found 2% of cases indicative of completeness failures (i.e., spurious verification failures) in Boogie's toolchain. These results indicate that lightweight analysis tools, such as those for model-based random testing, are also useful to test and validate formal verification tools such as Boogie.