🤖 AI Summary
Zero-knowledge proof (ZKP) code generation poses significant challenges due to high domain expertise requirements—particularly in finite-field arithmetic, constraint-system design, and gadget construction—as well as error-proneness; however, large language models’ (LLMs) capabilities in this domain remain uncharacterized. To address this gap, we introduce ZK-Eval, the first comprehensive benchmark suite for ZKP code generation, which exposes a substantial discrepancy between syntactic and semantic correctness across models. Building on these insights, we propose ZK-Coder, a novel agent-based generation framework integrating constraint sketch modeling, gadget-aware retrieval-augmented generation (RAG), and interactive iterative repair. Supporting both Circom and Noir, ZK-Coder improves code generation success rates from 17.35% to 83.38% and from 32.21% to 90.05%, respectively. Our work significantly lowers the barrier to ZKP development and advances the practical deployment of trustworthy computing systems.
📝 Abstract
Zero-knowledge proofs (ZKPs) are increasingly deployed in domains such as privacy-preserving authentication, blockchain scalability, and secure finance. However, authoring ZK programs remains challenging: unlike mainstream programming, ZK development requires reasoning about finite field arithmetic, constraint systems, and gadgets, making it knowledge-intensive and error-prone. While large language models (LLMs) have demonstrated strong code generation capabilities in general-purpose languages, their effectiveness for ZK programming, where correctness hinges on both language mastery and gadget-level reasoning, remains unexplored. To address this gap, we propose extsc{ZK-Eval}, a domain-specific evaluation pipeline that probes LLM capabilities at three levels: language knowledge, gadget competence, and end-to-end program generation. Our evaluation of four state-of-the-art LLMs reveals that models excel at surface-level syntax but struggle with gadget usage and semantic correctness, often yielding incorrect programs. Based on these insights, we introduce extsc{ZK-Coder}, an agentic framework that augments LLMs with constraint sketching, guided retrieval, and interactive repair. Experiments on Circom and Noir show substantial gains, with success rates improving from 17.35% to 83.38% and from 32.21% to 90.05%, respectively. With extsc{ZK-Eval} and extsc{ZK-Coder}, we establish a foundation for systematically measuring and augmenting LLMs in ZK code generation to lower barriers for practitioners and advance trustworthy computation.