EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform

πŸ“… 2025-03-01
πŸ›οΈ International Symposium on High-Performance Computer Architecture
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Fully Homomorphic Encryption (FHE) suffers from severe ciphertext expansion (up to 1000Γ—) and performance bottlenecks in bootstrapping and privacy-preserving machine learning, primarily due to off-chip memory bandwidth limitations; existing hardware accelerators incur high area/power overheads and lack programmability. This paper presents an efficient full-stack FHE acceleration platform supporting mainstream schemes (CKKS, BGV, BFV). It introduces a novel streaming memory access mechanism and circuit-level functional unit reuse, designs compact NTT and automorphism-specific hardware units, and develops a high-SRAM-utilization, cross-scheme programmable FHE instruction set architecture (ISA) with compiler support for vectorized optimization. Evaluated on FPGA, the prototype achieves 1.22Γ— average speedup over state-of-the-art accelerators. The ASIC implementation significantly outperforms prior work in both performance-per-area and performance-per-watt metrics.

Technology Category

Application Category

πŸ“ Abstract
Fully Homomorphic Encryption (FHE) is a set of powerful cryptographic schemes that allows computation to be performed directly on encrypted data with an unlimited depth. Despite FHE’s promising in privacy-preserving computing, yet in most FHE schemes, ciphertext generally blows up thousands of times compared to the original message, and the massive amount of data load from off-chip memory for bootstrapping and privacy-preserving machine learning applications (such as HELR, ResNet-20), both degrade the performance of FHE-based computation. Several hardware designs have been proposed to address this issue, however, most of them require enormous resources and power. An acceleration platform with easy programmability, high efficiency, and low overhead is a prerequisite for practical application. This paper proposes EFFACT, a highly efficient full-stack FHE acceleration platform with a compiler that provides comprehensive optimizations and vector-friendly hardware. We start by examining the computational overhead across different real-world benchmarks to highlight the potential benefits of reallocating computing resources for efficiency enhancement. Then we make a design space exploration to find an optimal SRAM size with high utilization and low cost. On the other hand, EFFACT features a novel optimization named streaming memory access which is proposed to enable high throughput with limited SRAMs. Regarding the software-side optimization, we also propose a circuit-level function unit reuse scheme, to substantially reduce the computing resources without performance degradation. Moreover, we design novel NTT and automorphism units that are suitable for a cost-sensitive and highly efficient architecture, leading to low area. For generality, EFFACT is also equipped with an ISA and a compiler backend that can support several FHE schemes like CKKS, BGV, and BFV. We provide both FPGA and ASIC versions of EFFACT. On account of our full stack design, FPGA-EFFACT outperforms the SOTA FPGA accelerators in gmean by $1.22 imes$. Meanwhile, ASIC-EFFACT shows increased improvements in terms of the performance per chip area and the performance per Watt compared with the SOTA ASIC works.
Problem

Research questions and friction points this paper is trying to address.

Addresses high ciphertext expansion in FHE schemes
Reduces resource and power demands in FHE hardware
Optimizes memory access and computation for FHE efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Full-stack FHE acceleration with compiler optimizations
Streaming memory access for high throughput
Circuit-level function unit reuse scheme
πŸ”Ž Similar Papers
No similar papers found.
Y
Yi Huang
Tsinghua University
X
Xinsheng Gong
Tsinghua University
X
Xiangyu Kong
Tsinghua University
D
Dibei Chen
Tsinghua University
J
Jianfeng Zhu
Tsinghua University
W
Wenping Zhu
Tsinghua University
L
Liangwei Li
Tsinghua University
Mingyu Gao
Mingyu Gao
Tsinghua University
Computer ArchitectureMemory SystemsHardware SecurityDomain-Specific Acceleration
Shaojun Wei
Shaojun Wei
Professor, Tsinghua University
A
Aoyang Zhang
Tsinghua University
Leibo Liu
Leibo Liu
Prof. of Institute of Microelectronics, Tsinghua University
Reconfigurable ComputingHardware Security and Cryptographic Processing