đ€ AI Summary
Existing machine learning verification tools suffer from fragmentation, limited expressivenessâparticularly in specifying complex properties such as multi-network coordinationâand poor cross-model support. This paper introduces CAISAR, an open-source platform featuring the first general-purpose formal specification language supporting diverse models including neural networks, SVMs, and gradient-boosted trees; it enables modeling of sophisticated properties beyond local robustness. CAISAR bridges specifications with mainstream verifiers (e.g., Marabou, ERAN) via automated graph-editing techniques, enabling plug-and-play integration. It is compatible with VNN-Comp solvers and has successfully verified intricate propertiesâincluding multi-network coordinationâin multiple real-world scenarios. All artifactsâincluding specifications, benchmarks, and toolingâare publicly released on Zenodo to ensure full reproducibility.
đ Abstract
The formal specification and verification of machine learning programs saw remarkable progress in less than a decade, leading to a profusion of tools. However, diversity may lead to fragmentation, resulting in tools that are difficult to compare, except for very specific benchmarks. Furthermore, this progress is heavily geared towards the specification and verification of a certain class of property, that is, local robustness properties. But while provers are becoming more and more efficient at solving local robustness properties, even slightly more complex properties, involving multiple neural networks for example, cannot be expressed in the input languages of winners of the International Competition of Verification of Neural Networks VNN-Comp. In this tool paper, we present CAISAR, an open-source platform dedicated to machine learning specification and verification. We present its specification language, suitable for modelling complex properties on neural networks, support vector machines and boosted trees. We show on concrete use-cases how specifications written in this language are automatically translated to queries to state-of-the-art provers, notably by using automated graph editing techniques, making it possible to use their off-the-shelf versions. The artifact to reproduce the paper claims is available at the following DOI: https://doi.org/10.5281/zenodo.15209510