QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing general-purpose large language models exhibit limited performance on the natural language to SystemVerilog assertion (NL2SVA) task, primarily due to the scarcity of high-quality real-world SVA corpora and the absence of reliable methods for assessing semantic equivalence between natural language specifications and SVA code. To address this, this work proposes an RTL-guided bidirectional data synthesis framework that leverages open-source RTL designs to generate large-scale NL↔SVA translation pairs, followed by semantic consistency filtering to construct high-quality training data. Using this approach, we train CodeV-SVA, the first family of specialized large models for NL2SVA. The CodeV-SVA-14B variant achieves 75.8% and 84.0% Func.@1 accuracy on the NL2SVA-Human and NL2SVA-Machine benchmarks, respectively, matching or surpassing the performance of state-of-the-art models such as GPT-5 and DeepSeek-R1.

Technology Category

Application Category

📝 Abstract
SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.
Problem

Research questions and friction points this paper is trying to address.

SystemVerilog Assertions
NL2SVA
data scarcity
semantic equivalence
hardware verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

RTL-Grounded Synthesis
Bidirectional Data Selection
NL2SVA
Specialized LLMs
Hardware Assertion Generation
🔎 Similar Papers
No similar papers found.
Y
Yutong Wu
SKL of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
C
Chenrui Cao
SKL of Processors, Institute of Computing Technology, CAS; University of Science and Technology of China
P
Pengwei Jin
SKL of Processors, Institute of Computing Technology, CAS
D
Di Huang
SKL of Processors, Institute of Computing Technology, CAS
Rui Zhang
Rui Zhang
Institute of Computing Technology, Chinese Academy of Sciences (ICT-CAS)
Computer VisionDeep Learning
Xishan Zhang
Xishan Zhang
Institute of Computing Technology of the Chinese Academy of Sciences
Z
Zidong Du
SKL of Processors, Institute of Computing Technology, CAS
Q
Qi Guo
SKL of Processors, Institute of Computing Technology, CAS
Xing Hu
Xing Hu
Institute of Computing Technology, Chinese Academy of Sciences
micro-architectureDeep learning architecture