EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

πŸ“… 2026-02-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing hardware model checking benchmarks are limited in number, structurally homogeneous, and often lack original RTL designs, hindering effective evaluation of verification tools and risking solver overfitting. This work proposes the first approach to integrate reinforcement learning into benchmark generation for hardware model checking. By generating computation graphs at an algorithmic abstraction level and compiling them via high-level synthesis (HLS) into functionally equivalent yet structurally diverse hardware designs, the method automatically constructs β€œsmall yet hard” challenging instances. Solver runtime serves as the reward signal, enabling a closed-loop co-design between design space exploration and solver feedback. The resulting benchmarks, produced in standard AIGER/BTOR2 formats, exhibit significant diversity and effectively expose performance bottlenecks in state-of-the-art model checkers, thereby providing a high-quality, unbiased resource for future tool evaluation.

Technology Category

Application Category

πŸ“ Abstract
Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed only in representations such as BTOR2 without access to the originating register-transfer-level (RTL) designs, and biased toward extreme difficulty where instances are either trivial or intractable. These limitations hinder rigorous evaluation of new verification techniques and encourage overfitting of solver heuristics to a narrow set of problems. To address this, we introduce EvolveGen, a framework for generating hardware model checking benchmarks by combining reinforcement learning (RL) with high-level synthesis (HLS). Our approach operates at an algorithmic level of abstraction in which an RL agent learns to construct computation graphs. By compiling these graphs under different synthesis directives, we produce pairs of functionally equivalent but structurally distinct hardware designs, inducing challenging model checking instances. Solver runtime is used as the reward signal, enabling the agent to autonomously discover and generate small-but-hard instances that expose solver-specific weaknesses. Experiments show that EvolveGen efficiently creates a diverse benchmark set in standard formats (e.g., AIGER and BTOR2) and effectively reveals performance bottlenecks in state-of-the-art model checkers.
Problem

Research questions and friction points this paper is trying to address.

hardware model checking
benchmark generation
RTL designs
solver evaluation
verification benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning
hardware model checking
benchmark generation
high-level synthesis
computation graph
πŸ”Ž Similar Papers
No similar papers found.