🤖 AI Summary
Large language models (LLMs) frequently generate code that fails formal verification, hindering their deployment in hardware and safety-critical applications.
Method: This paper introduces the first verification-driven, end-to-end natural language-to-silicon design generation framework. It integrates PREFACE—a reinforcement learning–based prompt optimization framework—into hardware synthesis without model fine-tuning, guiding LLMs to produce Dafny-verified correct code. The framework automatically translates verified specifications into synthesizable C code and RTL via a novel pipeline combining PyLog-based logic translation, the Dafny Python backend, and Vivado HLS high-level synthesis.
Contribution/Results: Evaluated on a benchmark of 100 tasks, the framework achieves a 72% end-to-end success rate and improves Dafny verification pass rate by 21 percentage points. It enables correctness-by-construction, scalable, and fully automated NL-to-silicon synthesis.
📝 Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities in automated code generation but frequently produce code that fails formal verification, an essential requirement for hardware and safety-critical domains. To overcome this fundamental limitation, we previously proposed PREFACE, a model-agnostic framework based on reinforcement learning (RL) that iteratively repairs the prompts provided to frozen LLMs, systematically steering them toward generating formally verifiable Dafny code without costly fine-tuning. This work presents Proof2Silicon, a novel end-to-end synthesis framework that embeds the previously proposed PREFACE flow to enable the generation of correctness-by-construction hardware directly from natural language specifications. Proof2Silicon operates by: (1) leveraging PREFACE's verifier-driven RL agent to optimize prompt generation iteratively, ensuring Dafny code correctness; (2) automatically translating verified Dafny programs into synthesizable high-level C using Dafny's Python backend and PyLog; and (3) employing Vivado HLS to produce RTL implementations. Evaluated rigorously on a challenging 100-task benchmark, PREFACE's RL-guided prompt optimization consistently improved Dafny verification success rates across diverse LLMs by up to 21%. Crucially, Proof2Silicon achieved an end-to-end hardware synthesis success rate of up to 72%, generating RTL designs through Vivado HLS synthesis flows. These results demonstrate a robust, scalable, and automated pipeline for LLM-driven, formally verified hardware synthesis, bridging natural-language specification and silicon realization.