FLASH-FHE: A Heterogeneous Architecture for Fully Homomorphic Encryption Acceleration

📅 2025-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited flexibility and poor energy efficiency of existing hardware accelerators for mixed-workload fully homomorphic encryption (FHE), this paper proposes the first heterogeneous hardware architecture supporting cooperative execution across multiple specialized compute clusters. The architecture comprises two complementary cluster types: bootstrappable clusters—optimized for deep, complex computations—and fast clusters—designed for shallow, lightweight operations. Key innovations include dynamic pipeline decomposition, on-chip memory sharing, and adaptive task scheduling, enabling precise alignment between computational resources and workload characteristics. Implemented in RTL and fabricated in 7 nm and 14 nm technologies, the design achieves 1.4× and 11.2× average speedup over CraterLake and F1, respectively, for deep workloads, and up to 8.0× peak speedup for shallow workloads. These improvements significantly enhance overall energy efficiency and throughput for FHE mixed workloads.

Technology Category

Application Category

📝 Abstract
While many hardware accelerators have recently been proposed to address the inefficiency problem of fully homomorphic encryption (FHE) schemes, none of them is able to deliver optimal performance when facing real-world FHE workloads consisting of a mixture of shallow and deep computations, due primarily to their homogeneous design principle. This paper presents FLASH-FHE, the first FHE accelerator with a heterogeneous architecture for mixed workloads. At its heart, FLASH-FHE designs two types of computation clusters, ie, bootstrappable and swift, to optimize for deep and shallow workloads respectively in terms of cryptographic parameters and hardware pipelines. We organize one bootstrappable and two swift clusters into one cluster affiliation, and present a scheduling scheme that provides sufficient acceleration for deep FHE workloads by utilizing all the affiliations, while improving parallelism for shallow FHE workloads by assigning one shallow workload per affiliation and dynamically decomposing the bootstrappable cluster into multiple swift pipelines to accelerate the assigned workload. We further show that these two types of clusters can share valuable on-chip memory, improving performance without significant resource consumption. We implement FLASH-FHE with RTL and synthesize it using both 7nm and 14/12nm technology nodes, and our experiment results demonstrate that FLASH-FHE achieves an average performance improvement of $1.4 imes$ and $11.2 imes$ compared to state-of-the-art FHE accelerators CraterLake and F1 for deep workloads, while delivering up to $8.0 imes$ speedup for shallow workloads due to its heterogeneous architecture.
Problem

Research questions and friction points this paper is trying to address.

Homomorphic Encryption
Hardware Accelerator
Computational Flexibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware Accelerator
Fully Homomorphic Encryption
Dual Computational Unit Architecture
🔎 Similar Papers
No similar papers found.
Junxue Zhang
Junxue Zhang
University of Science and Technology of China
Data Center NetworkingML SystemRDMA
X
Xiaodian Cheng
University of Waterloo
G
Gang Cao
Clustar
M
Meng Dai
Clustar
Yijun Sun
Yijun Sun
Professor of Bioinformatics, State University of New York at Buffalo
AIMachine LearningBioinformaticsComputational BiologyCancer Genomics
Han Tian
Han Tian
University of Science and Technology of China
Machine learningnetworkingprivacy computing
D
Dian Shen
Southeast University
Y
Yong Wang
iSINGLab @ Hong Kong University of Science and Technology
K
Kai Chen
iSINGLab @ Hong Kong University of Science and Technology