AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications

πŸ“… 2026-05-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the inefficiency and error-proneness of manually writing SystemVerilog Assertions (SVA) and the lack of realistic verification scenarios in existing large language model (LLM) evaluation benchmarks. To bridge this gap, the paper introduces AssertLLM2β€”the first open-source benchmark specifically designed for hardware verification assertion generation, encompassing 83 real-world design cases and supporting two critical tasks: bug-prevention and bug-hunting. AssertLLM2 innovatively employs systematically mutated erroneous RTL as input and establishes a multidimensional evaluation framework that assesses syntactic correctness, formal provability, coverage, and mutation detection capability. Built upon authentic design specifications, structured requirement descriptions, golden reference implementations, and fault injection techniques, AssertLLM2 provides a rigorous and comprehensive baseline for evaluating LLM performance in hardware verification.
πŸ“ Abstract
Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVerilog Assertions (SVAs) remains labor-intensive and error-prone. While Large Language Models (LLMs) show promise for automating this process, existing benchmarks remain limited by unrealistic task formulations, weak specification inputs, and oversimplified evaluation. To address these limitations, we introduce AssertLLM2, an open-source benchmark for realistic assertion generation in hardware verification. AssertLLM2 contains 83 real-world designs across 13 functional categories. For each design, the benchmark provides a structured design specification, a verified dependency-complete golden RTL, and systematically mutated buggy RTL variants. These support two practical settings: bug-prevention, where assertions are generated from specifications to guard against design errors, and bug-hunting, where assertions are generated to expose discrepancies between intended behavior and faulty implementations. To the best of our knowledge, AssertLLM2 is the first benchmark to explicitly use buggy RTL as input to evaluate bug-detection capability. AssertLLM2 further adopts a more rigorous evaluation framework spanning syntactic validity, formal provability, coverage, and mutation-based bug detection. Our benchmark enables a more realistic and extensive assessment of assertion generation and establishes rigorous baselines for state-of-the-art LLMs in practical hardware verification.
Problem

Research questions and friction points this paper is trying to address.

assertion generation
hardware verification
Large Language Models
SystemVerilog Assertions
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Assertion-based Verification
Large Language Models
Hardware Verification
Mutation-based Evaluation
SystemVerilog Assertions