Proof2Silicon: Prompt Repair for Verified Code and Hardware Generation via Reinforcement Learning

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Large language models (LLMs) frequently generate code that fails formal verification, hindering their deployment in hardware and safety-critical applications. Method: This paper introduces the first verification-driven, end-to-end natural language-to-silicon design generation framework. It integrates PREFACE—a reinforcement learning–based prompt optimization framework—into hardware synthesis without model fine-tuning, guiding LLMs to produce Dafny-verified correct code. The framework automatically translates verified specifications into synthesizable C code and RTL via a novel pipeline combining PyLog-based logic translation, the Dafny Python backend, and Vivado HLS high-level synthesis. Contribution/Results: Evaluated on a benchmark of 100 tasks, the framework achieves a 72% end-to-end success rate and improves Dafny verification pass rate by 21 percentage points. It enables correctness-by-construction, scalable, and fully automated NL-to-silicon synthesis.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in automated code generation but frequently produce code that fails formal verification, an essential requirement for hardware and safety-critical domains. To overcome this fundamental limitation, we previously proposed PREFACE, a model-agnostic framework based on reinforcement learning (RL) that iteratively repairs the prompts provided to frozen LLMs, systematically steering them toward generating formally verifiable Dafny code without costly fine-tuning. This work presents Proof2Silicon, a novel end-to-end synthesis framework that embeds the previously proposed PREFACE flow to enable the generation of correctness-by-construction hardware directly from natural language specifications. Proof2Silicon operates by: (1) leveraging PREFACE's verifier-driven RL agent to optimize prompt generation iteratively, ensuring Dafny code correctness; (2) automatically translating verified Dafny programs into synthesizable high-level C using Dafny's Python backend and PyLog; and (3) employing Vivado HLS to produce RTL implementations. Evaluated rigorously on a challenging 100-task benchmark, PREFACE's RL-guided prompt optimization consistently improved Dafny verification success rates across diverse LLMs by up to 21%. Crucially, Proof2Silicon achieved an end-to-end hardware synthesis success rate of up to 72%, generating RTL designs through Vivado HLS synthesis flows. These results demonstrate a robust, scalable, and automated pipeline for LLM-driven, formally verified hardware synthesis, bridging natural-language specification and silicon realization.

Problem

Research questions and friction points this paper is trying to address.

Generating formally verifiable code from natural language using LLMs

Automated repair of LLM prompts to ensure hardware correctness

End-to-end synthesis of verified hardware from specifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-based prompt repair for verifiable code generation

Automated translation from Dafny to synthesizable C code

End-to-end hardware synthesis from natural language specifications

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?