Demonstrating ARG-V's Generation of Realistic Java Benchmarks for SV-COMP

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the limited representativeness of Java programs in the SV-COMP benchmark suite, which hampers the effective evaluation of verification tools. To bridge this gap, the authors introduce the first application of ARG-V–based automatic generation techniques, aligned with SV-COMP verification task specifications, to construct 68 structurally sound and semantically realistic Java benchmark programs. Experimental evaluation reveals a significant drop in both precision and recall for four state-of-the-art Java verifiers on this new benchmark, exposing their limitations in handling real-world software behaviors. These findings not only provide concrete evidence for guiding future tool improvements but also substantially enhance the coverage and realism of the SV-COMP Java benchmark suite for more reliable empirical assessment.

Technology Category

Application Category

📝 Abstract

The SV-COMP competition provides a state-of-the-art platform for evaluating software verification tools on a standardized set of verification tasks. Consequently, verifier development outcomes are influenced by the composition of program benchmarks included in SV-COMP. When expanding this benchmark corpus, it is crucial to consider whether newly added programs cause verifiers to exhibit behavior distinct from that observed on existing benchmarks. Doing so helps mitigate external threats to the validity of the competition's results. In this paper, we present the application of the ARG-V tool for automatically generating Java verification benchmarks in the SV-COMP format. We demonstrate that, on a newly generated set of 68 realistic benchmarks, all four leading Java verifiers decrease in accuracy and recall compared to their performance on the existing benchmark suite. These findings highlight the potential of ARG-V to enhance the comprehensiveness and realism of verification tool evaluation, while also providing a roadmap for verifier developers aiming to improve their tools'applicability to real-world software.

Problem

Research questions and friction points this paper is trying to address.

software verification

benchmark generation

SV-COMP

Java programs

verifier evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

ARG-V

Java verification benchmarks

SV-COMP