Leveraging ASIC AI Chips for Homomorphic Encryption

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Homomorphic encryption (HE) faces practical deployment challenges in cloud environments due to prohibitively high computational overhead and the cost of domain-specific accelerators. To address this, this paper introduces the first HE compilation optimization framework tailored for AI accelerators—specifically, Google’s TPUv4. Our method bridges the semantic gap between HE arithmetic and AI hardware by rethinking HE operations through the lens of dense matrix computation. Key contributions include: (1) the first adaptation of HE modular multiplication and high-precision arithmetic to the matrix-centric execution model of AI chips; (2) three novel compilation mapping techniques—BARRETT modular reduction, Basis Aligned Transformation (BAT), and Matrix Aligned Transformation (MAT). Implemented atop the CROSS compiler, our framework achieves up to 161× speedup over multi-core CPUs and 5× over NVIDIA V100 GPUs for core HE operators on TPUv4. All optimized kernels are open-sourced.

Technology Category

Application Category

📝 Abstract
Cloud-based services are making the outsourcing of sensitive client data increasingly common. Although homomorphic encryption (HE) offers strong privacy guarantee, it requires substantially more resources than computing on plaintext, often leading to unacceptably large latencies in getting the results. HE accelerators have emerged to mitigate this latency issue, but with the high cost of ASICs. In this paper we show that HE primitives can be converted to AI operators and accelerated on existing ASIC AI accelerators, like TPUs, which are already widely deployed in the cloud. Adapting such accelerators for HE requires (1) supporting modular multiplication, (2) high-precision arithmetic in software, and (3) efficient mapping on matrix engines. We introduce the CROSS compiler (1) to adopt Barrett reduction to provide modular reduction support using multiplier and adder, (2) Basis Aligned Transformation (BAT) to convert high-precision multiplication as low-precision matrix-vector multiplication, (3) Matrix Aligned Transformation (MAT) to covert vectorized modular operation with reduction into matrix multiplication that can be efficiently processed on 2D spatial matrix engine. Our evaluation of CROSS on a Google TPUv4 demonstrates significant performance improvements, with up to 161x and 5x speedup compared to the previous work on many-core CPUs and V100. The kernel-level codes are open-sourced at https://github.com/google/jaxite.git.
Problem

Research questions and friction points this paper is trying to address.

Homomorphic Encryption
Cloud Services
Data Privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Homomorphic Encryption Acceleration
AI Accelerator Utilization
CROSS Compiler Design
🔎 Similar Papers
No similar papers found.
Jianming Tong
Jianming Tong
PhD candidate at Georgia Tech; Visiting Researcher @ MIT
Computer ArchitecturePrivacy-preserving AI
T
Tianhao Huang
Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Leo de Castro
Leo de Castro
MIT
CryptographyComputer Science
Anirudh Itagi
Anirudh Itagi
Student, Georgia Institute of Technology
J
Jing Dang
Georgia Institute of Technology, Atlanta, Georgia, USA
Anupam Golder
Anupam Golder
Intel Corporation
VLSIHardware SecurityCryptographyMachine LearningAnalog IC Design
A
Asra Ali
Google, Austin, Texas, USA
J
Jevin Jiang
Google, Sunnyvale, California, USA
A
Arvind
Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
G
G. E. Suh
Cornell University/NVIDIA, Ithaca, New York, USA
Tushar Krishna
Tushar Krishna
Associate Professor, Georgia Tech
Computer ArchitectureInterconnection NetworksNetwork-on-ChipDeep Learning Accelerators