Floating-Point Neural Network Verification at the Software Level

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Floating-point software implementations of neural networks in safety-critical systems lack provable correctness guarantees. Method: This work introduces the first software-level formal verification approach for C-language neural network code, explicitly modeling floating-point arithmetic semantics. It establishes NeuroCodeBench 2.0—a large-scale, open-source benchmark comprising 912 SV-COMP–compliant verification tasks covering activation functions, canonical layers, and end-to-end network architectures. Contribution/Results: The benchmark enables systematic evaluation and improvement of neural network verifiers: eight state-of-the-art tools achieve only 11% average pass rates with ~3% error rates; subsequent research has significantly enhanced their solving capabilities. This work bridges a critical gap in formal verification of floating-point neural network implementations at the software level, providing a reproducible benchmark and methodological foundation for AI system safety certification.

Technology Category

Application Category

📝 Abstract

The behaviour of neural network components must be proven correct before deployment in safety-critical systems. Unfortunately, existing neural network verification techniques cannot certify the absence of faults at the software level. In this paper, we show how to specify and verify that neural networks are safe, by explicitly reasoning about their floating-point implementation. In doing so, we construct NeuroCodeBench 2.0, a benchmark comprising 912 neural network verification examples that cover activation functions, common layers, and full neural networks of up to 170K parameters. Our verification suite is written in plain C and is compatible with the format of the International Competition on Software Verification (SV-COMP). Thanks to it, we can conduct the first rigorous evaluation of eight state-of-the-art software verifiers on neural network code. The results show that existing automated verification tools can correctly solve an average of 11% of our benchmark, while producing around 3% incorrect verdicts. At the same time, a historical analysis reveals that the release of our benchmark has already had a significantly positive impact on the latter.

Problem

Research questions and friction points this paper is trying to address.

Verifying floating-point neural network implementations at software level

Addressing absence of software-level fault certification in neural networks

Evaluating automated verifiers on neural network code correctness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifying neural networks using floating-point implementations

Creating NeuroCodeBench 2.0 with 912 verification examples

Evaluating eight software verifiers on neural network code

🔎 Similar Papers

Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization