🤖 AI Summary
To address the prohibitively high computational overhead of matrix-vector multiplication under fully homomorphic encryption (FHE) in oblivious message retrieval (OMR), which hinders practical deployment, this work proposes an FPGA-based hardware acceleration framework tailored for metadata privacy protection. We present the first HLS-based implementation of high-order homomorphic operations—including ciphertext rotation and ciphertext-plaintext multiplication—on FPGA, featuring a tunable parallelism architecture and an efficient design-space exploration methodology. Compared to CPU-based software implementations, our accelerator achieves a 13.86× speedup in FHE ciphertext scanning throughput. This work breaks the performance bottleneck of FHE-based ciphertext processing on OMR servers, delivering a practical, high-efficiency hardware foundation for real-world metadata privacy-preserving systems.
📝 Abstract
While end-to-end encryption protects the content of messages, it does not secure metadata, which exposes sender and receiver information through traffic analysis. A plausible approach to protecting this metadata is to have senders post encrypted messages on a public bulletin board and receivers scan it for relevant messages. Oblivious message retrieval (OMR) leverages homomorphic encryption (HE) to improve user experience in this solution by delegating the scan to a resource-rich server while preserving privacy. A key process in OMR is the homomorphic detection of pertinent messages for the receiver from the bulletin board. It relies on a specialized matrix-vector multiplication algorithm, which involves extensive multiplications between ciphertext vectors and plaintext matrices, as well as homomorphic rotations. The computationally intensive nature of this process limits the practicality of OMR. To address this challenge, this paper proposes a hardware architecture to accelerate the matrix-vector multiplication algorithm. The building homomorphic operators in this algorithm are implemented using high-level synthesis, with design parameters for different parallelism levels. These operators are then deployed on a field-programmable gate array platform using an efficient design space exploration strategy to accelerate homomorphic matrix-vector multiplication. Compared to a software implementation, the proposed hardware accelerator achieves a 13.86x speedup.