From Evaluation to Enhancement: Large Language Models for Zero-Knowledge Proof Code Generation

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
Zero-knowledge proof (ZKP) code generation poses significant challenges due to high domain expertise requirements—particularly in finite-field arithmetic, constraint-system design, and gadget construction—as well as error-proneness; however, large language models’ (LLMs) capabilities in this domain remain uncharacterized. To address this gap, we introduce ZK-Eval, the first comprehensive benchmark suite for ZKP code generation, which exposes a substantial discrepancy between syntactic and semantic correctness across models. Building on these insights, we propose ZK-Coder, a novel agent-based generation framework integrating constraint sketch modeling, gadget-aware retrieval-augmented generation (RAG), and interactive iterative repair. Supporting both Circom and Noir, ZK-Coder improves code generation success rates from 17.35% to 83.38% and from 32.21% to 90.05%, respectively. Our work significantly lowers the barrier to ZKP development and advances the practical deployment of trustworthy computing systems.

Technology Category

Application Category

📝 Abstract
Zero-knowledge proofs (ZKPs) are increasingly deployed in domains such as privacy-preserving authentication, blockchain scalability, and secure finance. However, authoring ZK programs remains challenging: unlike mainstream programming, ZK development requires reasoning about finite field arithmetic, constraint systems, and gadgets, making it knowledge-intensive and error-prone. While large language models (LLMs) have demonstrated strong code generation capabilities in general-purpose languages, their effectiveness for ZK programming, where correctness hinges on both language mastery and gadget-level reasoning, remains unexplored. To address this gap, we propose extsc{ZK-Eval}, a domain-specific evaluation pipeline that probes LLM capabilities at three levels: language knowledge, gadget competence, and end-to-end program generation. Our evaluation of four state-of-the-art LLMs reveals that models excel at surface-level syntax but struggle with gadget usage and semantic correctness, often yielding incorrect programs. Based on these insights, we introduce extsc{ZK-Coder}, an agentic framework that augments LLMs with constraint sketching, guided retrieval, and interactive repair. Experiments on Circom and Noir show substantial gains, with success rates improving from 17.35% to 83.38% and from 32.21% to 90.05%, respectively. With extsc{ZK-Eval} and extsc{ZK-Coder}, we establish a foundation for systematically measuring and augmenting LLMs in ZK code generation to lower barriers for practitioners and advance trustworthy computation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for zero-knowledge proof code generation capabilities
Addressing LLM struggles with gadget usage and semantic correctness
Augmenting LLMs to improve ZKP program generation success rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specific evaluation pipeline for LLMs
Agentic framework with constraint sketching
Guided retrieval and interactive repair