Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

๐Ÿ“… 2023-10-16
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing post-training polynomial approximation methods for private inference under fully homomorphic encryption (FHE) suffer from prohibitive computational overhead due to uniform high-degree approximations across all layers. To address this, we propose Layer-Adaptive Polynomial Approximation (LAPA), a training-free framework that models input distributions per layer and employs layer-aware dynamic programming to optimally allocate approximation degreesโ€”jointly minimizing both accuracy degradation and FHE computation latency. A key innovation is the introduction of low-degree (e.g., cubic) GELU alternatives that preserve model performance while significantly improving efficiency. Experiments demonstrate that LAPA accelerates FHE-based inference by 3.02ร— and 2.82ร— for ResNet-20 and ResNet-32, respectively; for ConvNeXt on CIFAR-10, a cubic polynomial alone maintains the original test accuracy.
๐Ÿ“ Abstract
Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.
Problem

Research questions and friction points this paper is trying to address.

Optimizing activation function approximations for efficient private inference
Reducing computational time while maintaining accuracy in homomorphic encryption
Developing layer-specific polynomial approximations without model retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layerwise polynomial approximation for homomorphic encryption
Dynamic programming optimizes layer-specific polynomial degrees
Input distribution analysis improves accuracy-time tradeoff
๐Ÿ”Ž Similar Papers
No similar papers found.