Practical and Private Hybrid ML Inference with Fully Homomorphic Encryption

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key challenges in cloud-based fully homomorphic encryption (FHE) inference—including high bootstrapping overhead, low approximation accuracy of nonlinear activations, and insufficient model confidentiality—this paper proposes Safhire, a hybrid machine learning inference framework balancing privacy and efficiency. Methodologically, Safhire delegates linear layers to the server for ciphertext-domain FHE evaluation while offloading nonlinear activations to the client for exact plaintext computation, thereby eliminating bootstrapping entirely and enabling precise activation functions. It further introduces a randomized weight-reordering mechanism to protect model parameter confidentiality and integrates ciphertext batch encoding with partial unpacking to optimize communication bandwidth and inference latency. Experimental results demonstrate that Safhire reduces inference latency by 1.5×–10.5× compared to Orion, maintains lossless model accuracy, and achieves controlled communication overhead—validating its effective trade-off between security guarantees and practical deployability.

Technology Category

Application Category

📝 Abstract
In contemporary cloud-based services, protecting users' sensitive data and ensuring the confidentiality of the server's model are critical. Fully homomorphic encryption (FHE) enables inference directly on encrypted inputs, but its practicality is hindered by expensive bootstrapping and inefficient approximations of non-linear activations. We introduce Safhire, a hybrid inference framework that executes linear layers under encryption on the server while offloading non-linearities to the client in plaintext. This design eliminates bootstrapping, supports exact activations, and significantly reduces computation. To safeguard model confidentiality despite client access to intermediate outputs, Safhire applies randomized shuffling, which obfuscates intermediate values and makes it practically impossible to reconstruct the model. To further reduce latency, Safhire incorporates advanced optimizations such as fast ciphertext packing and partial extraction. Evaluations on multiple standard models and datasets show that Safhire achieves 1.5X - 10.5X lower inference latency than Orion, a state-of-the-art baseline, with manageable communication overhead and comparable accuracy, thereby establishing the practicality of hybrid FHE inference.
Problem

Research questions and friction points this paper is trying to address.

Reducing FHE inference latency by eliminating bootstrapping operations
Enabling exact non-linear activations while maintaining privacy
Protecting model confidentiality despite client-side plaintext processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework offloads non-linearities to client
Randomized shuffling obfuscates intermediate values for confidentiality
Fast ciphertext packing and partial extraction reduce latency
🔎 Similar Papers
No similar papers found.