đ¤ AI Summary
Large language models (LLMs) exhibit limited capability in hardware design tasksâparticularly RTL code understandingâdue to their inability to infer dynamic timing behavior from static syntax alone, hindering performance in downstream applications such as code completion, documentation generation, and verification. To address this, we propose an AI agent framework grounded in a hypothesis-generationâexperimental-validationéçŻ: LLMs formulate behavioral hypotheses about RTL modules, which are then rigorously tested via EDA toolchainsâincluding simulation, waveform analysis, and formal verificationâto iteratively refine the modelâs internal representation. This approach transcends purely syntactic parsing, enabling deep semantic and temporal modeling of RTL designs. Experimental evaluation on formal verification tasks demonstrates that our method significantly outperforms baseline LLMs lacking experimental feedback, validating the effectiveness and necessity of dynamic, experiment-driven reasoning for enhancing hardware design understanding.
đ Abstract
AI agents powered by large language models (LLMs) are being used to solve increasingly complex software engineering challenges, but struggle with hardware design tasks. Register Transfer Level (RTL) code presents a unique challenge for LLMs, as it encodes complex, dynamic, time-evolving behaviors using the low-level language features of SystemVerilog. LLMs struggle to infer these complex behaviors from the syntax of RTL alone, which limits their ability to complete all downstream tasks like code completion, documentation, or verification. In response to this issue, we present DUET: a general methodology for developing Design Understanding via Experimentation and Testing. DUET mimics how hardware design experts develop an understanding of complex designs: not just via a one-off readthrough of the RTL, but via iterative experimentation using a number of tools. DUET iteratively generates hypotheses, tests them with EDA tools (e.g., simulation, waveform inspection, and formal verification), and integrates the results to build a bottom-up understanding of the design. In our evaluations, we show that DUET improves AI agent performance on formal verification, when compared to a baseline flow without experimentation.