🤖 AI Summary
To address key challenges in cloud-based fully homomorphic encryption (FHE) inference—including high bootstrapping overhead, low approximation accuracy of nonlinear activations, and insufficient model confidentiality—this paper proposes Safhire, a hybrid machine learning inference framework balancing privacy and efficiency. Methodologically, Safhire delegates linear layers to the server for ciphertext-domain FHE evaluation while offloading nonlinear activations to the client for exact plaintext computation, thereby eliminating bootstrapping entirely and enabling precise activation functions. It further introduces a randomized weight-reordering mechanism to protect model parameter confidentiality and integrates ciphertext batch encoding with partial unpacking to optimize communication bandwidth and inference latency. Experimental results demonstrate that Safhire reduces inference latency by 1.5×–10.5× compared to Orion, maintains lossless model accuracy, and achieves controlled communication overhead—validating its effective trade-off between security guarantees and practical deployability.
📝 Abstract
In contemporary cloud-based services, protecting users' sensitive data and ensuring the confidentiality of the server's model are critical. Fully homomorphic encryption (FHE) enables inference directly on encrypted inputs, but its practicality is hindered by expensive bootstrapping and inefficient approximations of non-linear activations. We introduce Safhire, a hybrid inference framework that executes linear layers under encryption on the server while offloading non-linearities to the client in plaintext. This design eliminates bootstrapping, supports exact activations, and significantly reduces computation. To safeguard model confidentiality despite client access to intermediate outputs, Safhire applies randomized shuffling, which obfuscates intermediate values and makes it practically impossible to reconstruct the model. To further reduce latency, Safhire incorporates advanced optimizations such as fast ciphertext packing and partial extraction. Evaluations on multiple standard models and datasets show that Safhire achieves 1.5X - 10.5X lower inference latency than Orion, a state-of-the-art baseline, with manageable communication overhead and comparable accuracy, thereby establishing the practicality of hybrid FHE inference.