🤖 AI Summary
To address the high computational overhead and the challenge of jointly optimizing memory access and latency in client-side fully homomorphic encryption (FHE) under CKKS bootstrapping, this work proposes an area- and power-efficient hardware accelerator. The design features a reconfigurable Fourier engine supporting both NTT and FFT, an on-chip pseudorandom number generator (PRNG), and a dynamic twiddle factor generator, integrated within a streaming architecture with fine-grained task scheduling to minimize off-chip memory traffic and computational redundancy. Implemented in 28 nm CMOS, the chip occupies 28.638 mm² and consumes 5.654 W. It achieves 1112× and 214× speedup over CPU and state-of-the-art client-side accelerators for encoding/encryption, respectively, and 963× (vs. CPU) and 82× (vs. SOTA) acceleration for decoding/decryption—enabling efficient hardware support for lightweight FHE deployment on resource-constrained terminals.
📝 Abstract
As the demand for privacy-preserving computation continues to grow, fully homomorphic encryption (FHE)-which enables continuous computation on encrypted data-has become a critical solution. However, its adoption is hindered by significant computational overhead, requiring 10000-fold more computation compared to plaintext processing. Recent advancements in FHE accelerators have successfully improved server-side performance, but client-side computations remain a bottleneck, particularly under bootstrappable parameter configurations, which involve combinations of encoding, encrypt, decoding, and decrypt for large-sized parameters. To address this challenge, we propose ABC-FHE, an area- and power-efficient FHE accelerator that supports bootstrappable parameters on the client side. ABC-FHE employs a streaming architecture to maximize performance density, minimize area usage, and reduce off-chip memory access. Key innovations include a reconfigurable Fourier engine capable of switching between NTT and FFT modes. Additionally, an on-chip pseudo-random number generator and a unified on-the-fly twiddle factor generator significantly reduce memory demands, while optimized task scheduling enhances the CKKS client-side processing, achieving reduced latency. Overall, ABC-FHE occupies a die area of 28.638 mm2 and consumes 5.654 W of power in 28 nm technology. It delivers significant performance improvements, achieving a 1112x speed-up in encoding and encryption execution time compared to a CPU, and 214x over the state-of-the-art client-side accelerator. For decoding and decryption, it achieves a 963x speed-up over the CPU and 82x over the state-of-the-art accelerator.