Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Large-integer modular arithmetic—critical for fully homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs)—incurs prohibitively high computational overhead on GPUs, severely hindering practical deployment. Method: We propose MoMA (Multi-word Modular Arithmetic), a type-driven recursive rewriting system that formally models large-integer modular arithmetic as a compiler-friendly, hardware-portable abstraction. MoMA supports automatic generation of GPU-accelerated cryptographic kernels via typed rewriting rules, BLAS/NTT kernel instantiation, and GPU backend code synthesis. Contribution/Results: MoMA enables order-of-magnitude performance gains: its generated BLAS kernels outperform state-of-the-art multiprecision libraries by 2–3× in throughput; its NTT kernels achieve near-ASIC performance on commercial GPUs. By unifying modular arithmetic under a portable, type-safe abstraction, MoMA significantly enhances both execution efficiency and hardware adaptability of FHE and ZKP workloads.

Technology Category

Application Category

📝 Abstract

Fully homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. However, the widespread adoption of these encryption techniques is hindered by their significant computational overhead, primarily resulting from core cryptographic operations that involve large integer arithmetic. This paper presents a formalization of multi-word modular arithmetic (MoMA), which breaks down large bit-width integer arithmetic into operations on machine words. We further develop a rewrite system that implements MoMA through recursive rewriting of data types, designed for compatibility with compiler infrastructures and code generators. We evaluate MoMA by generating cryptographic kernels, including basic linear algebra subprogram (BLAS) operations and the number theoretic transform (NTT), targeting various GPUs. Our MoMA-based BLAS operations outperform state-of-the-art multi-precision libraries by orders of magnitude, and MoMA-based NTTs achieve near-ASIC performance on commodity GPUs.

Problem

Research questions and friction points this paper is trying to address.

Homomorphic Encryption

Zero-Knowledge Proofs

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

MoMA

GPU Efficiency

Homomorphic Encryption and Zero-Knowledge Proofs

🔎 Similar Papers

Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs