ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Current large language models (LLMs) lack standardized evaluation for hardware description language (HDL) code generation, particularly for synthesizable, functionally correct communication protocol implementations. Method: We introduce the first protocol-level RTL generation benchmark targeting SPI, I²C, UART, and AXI protocols, featuring multi-abstraction-level generation tasks and a rigorous synthesis-readiness validation pipeline—including syntax checking, logic synthesis, and UVM-driven waveform simulation. Contribution/Results: Evaluating 12 prominent open- and closed-weight LLMs, we find only two models pass all functional correctness checks, with an average synthesis success rate below 35%. Results reveal pervasive deficiencies in protocol-specific timing modeling and concurrent control handling. This benchmark fills a critical gap in evaluating LLMs for digital circuit protocol implementation and establishes a new evaluation paradigm for HDL code generation capability.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Models (LLMs) have shown promising capabilities in generating code for general-purpose programming languages. In contrast, their applicability for hardware description languages, particularly for generating synthesizable and functionally correct designs, remains significantly underexplored. HDLs such as SystemVerilog are logic-oriented and demand strict adherence to timing semantics, concurrency, and synthesizability constraints. Moreover, HDL-based design flows encompass a broad set of tasks beyond structural code generation, including testbench development, assertion-based verification, timing closure, and protocol-level integration for on-chip communication. The objective of our paper is to analyze the capabilities of state-of-the-art LLMs in generating SystemVerilog implementations of standard communication protocols, a core component of embedded and System-on-Chip (SoC) architectures. This paper introduces the first benchmark suite targeting four widely used protocols: SPI, I2C, UART, and AXI. We define code generation tasks that capture varying levels of design abstraction and prompt specificity. The generated designs are assessed for syntactic correctness, synthesizability, and functional fidelity via waveform simulation and test benches.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs for SystemVerilog code generation in hardware design

Evaluating synthesizability and functional correctness of protocol implementations

Creating benchmark for SPI, I2C, UART, and AXI protocol generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark suite for SystemVerilog protocol generation

Assesses LLM-generated designs for correctness and synthesizability

Focuses on SPI, I2C, UART, and AXI protocols

🔎 Similar Papers

No similar papers found.