🤖 AI Summary
Generating correct, compilable TFHE code—particularly for logic gates and ReLU activation functions—is challenging due to high expert dependency, frequent errors, and low compilation success rates. Method: We propose the first LLM-driven integrated evaluation framework for TFHE compilation, incorporating RAG-enhanced few-shot prompting, a novel TFHE-specific encrypted code generation benchmark, tight integration with the tfhe-rs compiler, multi-source LLM inference (open- and closed-weight), and structural similarity quantification. Contribution/Results: Experiments show that off-the-shelf LLMs yield 72% error rates and only 31% compilation success; our framework reduces errors significantly, improves structural similarity by over 40%, and raises compilation success to 89%. This work provides the first systematic validation of LLMs’ feasibility and effectiveness in generating privacy-preserving, low-level homomorphic encryption code.
📝 Abstract
Fully Homomorphic Encryption over the torus (TFHE) enables computation on encrypted data without decryption, making it a cornerstone of secure and confidential computing. Despite its potential in privacy preserving machine learning, secure multi party computation, private blockchain transactions, and secure medical diagnostics, its adoption remains limited due to cryptographic complexity and usability challenges. While various TFHE libraries and compilers exist, practical code generation remains a hurdle. We propose a compiler integrated framework to evaluate LLM inference and agentic optimization for TFHE code generation, focusing on logic gates and ReLU activation. Our methodology assesses error rates, compilability, and structural similarity across open and closedsource LLMs. Results highlight significant limitations in off-the-shelf models, while agentic optimizations such as retrieval augmented generation (RAG) and few-shot prompting reduce errors and enhance code fidelity. This work establishes the first benchmark for TFHE code generation, demonstrating how LLMs, when augmented with domain-specific feedback, can bridge the expertise gap in FHE code generation.