🤖 AI Summary
To address the high computational overhead and inherent trade-off between accuracy and efficiency in deep neural network (DNN) inference under fully homomorphic encryption (FHE), this paper pioneers the systematic integration of unstructured sparsity into FHE-based matrix multiplication. We propose three FHE-native sparse multiplication schemes that break the conventional performance-overhead trade-off observed in plaintext sparse computation. Our approach jointly optimizes sparse encoding (CSR, COO, and block-sparse formats), multithreaded ciphertext-level parallelism, and co-design of weight compression and homomorphic encryption operations. Experiments demonstrate an average 2.5× speedup at 50% sparsity, up to 32.5× acceleration on 64 cores versus single-thread execution, and significant reduction in ciphertext matrix memory footprint—scaling with sparsity. Crucially, we are the first to empirically validate sparsity effectiveness across the full sparsity spectrum in FHE and to show that computational efficiency and memory usage can be simultaneously optimized.
📝 Abstract
The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE). This paper explores unstructured sparsity in FHE matrix multiplication schemes as a means of reducing this burden while maintaining model accuracy requirements. We demonstrate that sparsity can be exploited in arbitrary matrix multiplication, providing runtime benefits compared to a baseline naive algorithm at all sparsity levels. This is a notable departure from the plaintext domain, where there is a trade-off between sparsity and the overhead of the sparse multiplication algorithm. In addition, we propose three sparse multiplication schemes in FHE based on common plaintext sparse encodings. We demonstrate the performance gain is scheme-invariant; however, some sparse schemes vastly reduce the memory storage requirements of the encrypted matrix at high sparsity values. Our proposed sparse schemes yield an average performance gain of 2.5x at 50% unstructured sparsity, with our multi-threading scheme providing a 32.5x performance increase over the equivalent single-threaded sparse computation when utilizing 64 cores.