EncFormer: Secure and Efficient Transformer Inference over Encrypted Data

πŸ“… 2026-04-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

216K/year
πŸ€– AI Summary
This work addresses the vulnerability of user-sensitive data leakage in Transformer inference within machine learning as a service, where existing solutions based on fully homomorphic encryption (FHE) and secure multi-party computation (MPC) suffer from low efficiency, high communication overhead, and costly FHE–MPC conversions. To overcome these limitations, the authors propose EncFormer, a two-party collaborative framework for private Transformer inference that introduces a stage-compatible paradigm to optimize FHE kernel composition, thereby minimizing repacking and FHE–MPC switching. They formulate a minimal conversion-cost model to guide protocol boundary selection and design an efficient complex-number CKKS-to-MPC conversion alongside a communication-optimized MPC protocol for nonlinear operations, accelerated via GPU. Experiments demonstrate that EncFormer reduces online MPC communication by 1.4–30.4Γ— and end-to-end latency by 1.3–9.8Γ— over state-of-the-art hybrid FHE–MPC systems on GPT- and BERT-like models, while achieving 1.9–3.5Γ— lower latency than pure FHE approaches on BERT-base with GLUE task accuracy nearly matching plaintext execution.

Technology Category

Application Category

πŸ“ Abstract
Transformer inference in machine-learning-as-a-service (MLaaS) raises privacy concerns for sensitive user inputs. Prior secure solutions that combine fully homomorphic encryption (FHE) and secure multiparty computation (MPC) are bottlenecked by inefficient FHE kernels, communication-heavy MPC protocols, and expensive FHE-MPC conversions. We present EncFormer, a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. EncFormer also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities. With GPU optimizations, evaluations on GPT- and BERT-style models show that EncFormer achieves 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems, and 1.9x-3.5x lower end-to-end latency on BERT-base than FHE-only pipelines under a matched backend, while maintaining near-plaintext accuracy on selected GLUE tasks.
Problem

Research questions and friction points this paper is trying to address.

Transformer inference
privacy
fully homomorphic encryption
secure multiparty computation
machine-learning-as-a-service
Innovation

Methods, ideas, or system contributions that make the work stand out.

EncFormer
Fully Homomorphic Encryption (FHE)
Secure Multiparty Computation (MPC)
Transformer Inference
Privacy-Preserving Machine Learning
πŸ”Ž Similar Papers
No similar papers found.