🤖 AI Summary
To address the low execution efficiency and high energy consumption of CKKS-specific symmetric ciphers (HERA and Rubato) in hybrid homomorphic encryption (HHE) on client devices, this paper proposes an FPGA-accelerated hardware architecture. Our design employs vectorization and deep pipelining, exploits transform invariance to optimize state permutations and eliminate pipeline stalls, and decouples random number generation from key computation to hide latency and shorten the critical path. Implemented on the AMD Virtex UltraScale+ platform, it leverages low-latency FIFOs and parallel computation. Experimental results demonstrate that, compared to software implementations, Rubato achieves a 6× throughput improvement, 5× latency reduction, and 75× energy savings; HERA achieves 6× higher throughput, 3× lower latency, and 47× reduced energy consumption. The proposed architecture significantly enhances the energy efficiency and real-time performance of HHE clients.
📝 Abstract
Hybrid Homomorphic Encryption (HHE) combines symmetric key and homomorphic encryption to reduce ciphertext expansion crucial in client-server deployments of HE. Special symmetric ciphers, amenable to efficient HE evaluation, have been developed. Their client-side deployment calls for performant and energy-efficient implementation, and in this paper we develop and evaluate hardware accelerators for the two known CKKS-targeting HHE ciphers, HERA and Rubato.
We design vectorized and overlapped functional modules. The design exploits transposition-invariance property of the MixColumns and MixRows function and alternates the order of intermediate state to eliminate bubbles in stream key generation, improving latency and throughput. We decouple the RNG and key computation phases to hide the latency of RNG and to reduce the critical path in FIFOs, achieving higher operating frequency.
We implement the accelerator on an AMD Virtex UltraScale+ FPGA. Both Rubato and HERA achieve a 6x improvement in throughput compared to the software implementation. In terms of latency, Rubato achieves a 5x reduction, while HERA achieves a 3x reduction. Additionally, our hardware implementations reduce energy consumption by 75x for Rubato and 47x for HERA compared to their software implementation.