π€ AI Summary
This study addresses the lack of empirical evidence in theorem prover selection by conducting the first systematic, cross-platform comparison of Coq and Idris2βevaluated on a unified task: correctness verification of insertion sort. The methodology employs interactive formal verification, integrating implementation, proof strategy design, and standard library usage to enable both qualitative and empirical analysis across three dimensions: usability, community support, and library ecosystem. Results indicate that Coq exhibits significant advantages in standard library completeness, toolchain maturity, and community resources. In contrast, Idris2 demonstrates innovative potential in proof expressiveness and program-proof integration, leveraging its dependent type system and built-in computational capabilities. This work establishes the first empirically grounded, task-aligned benchmark for cross-prover evaluation and provides practitioners with actionable guidance for formal tool selection and system design.
π Abstract
Theorem provers are important tools for people working in formal verification. There are a myriad of interactive systems available today, with varying features and approaches motivating their development. These design choices impact their usability, alongside the problem domain in which they are employed. We test-drive two such provers, Coq and Idris2, by proving the correctness of insertion sort, before providing a qualitative evaluation of their performance. We then compare their community and library support. This work helps users to make an informed choice of system, and highlight approaches in other systems that developers might find useful.