EncFormer: Secure and Efficient Transformer Inference over Encrypted Data

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the vulnerability of user-sensitive data leakage in Transformer inference within machine learning as a service, where existing solutions based on fully homomorphic encryption (FHE) and secure multi-party computation (MPC) suffer from low efficiency, high communication overhead, and costly FHE–MPC conversions. To overcome these limitations, the authors propose EncFormer, a two-party collaborative framework for private Transformer inference that introduces a stage-compatible paradigm to optimize FHE kernel composition, thereby minimizing repacking and FHE–MPC switching. They formulate a minimal conversion-cost model to guide protocol boundary selection and design an efficient complex-number CKKS-to-MPC conversion alongside a communication-optimized MPC protocol for nonlinear operations, accelerated via GPU. Experiments demonstrate that EncFormer reduces online MPC communication by 1.4–30.4× and end-to-end latency by 1.3–9.8× over state-of-the-art hybrid FHE–MPC systems on GPT- and BERT-like models, while achieving 1.9–3.5× lower latency than pure FHE approaches on BERT-base with GLUE task accuracy nearly matching plaintext execution.

Technology Category

Application Category

📝 Abstract

Transformer inference in machine-learning-as-a-service (MLaaS) raises privacy concerns for sensitive user inputs. Prior secure solutions that combine fully homomorphic encryption (FHE) and secure multiparty computation (MPC) are bottlenecked by inefficient FHE kernels, communication-heavy MPC protocols, and expensive FHE-MPC conversions. We present EncFormer, a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. EncFormer also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities. With GPU optimizations, evaluations on GPT- and BERT-style models show that EncFormer achieves 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems, and 1.9x-3.5x lower end-to-end latency on BERT-base than FHE-only pipelines under a matched backend, while maintaining near-plaintext accuracy on selected GLUE tasks.

Problem

Research questions and friction points this paper is trying to address.

Transformer inference

privacy

fully homomorphic encryption

secure multiparty computation

machine-learning-as-a-service

Innovation

Methods, ideas, or system contributions that make the work stand out.

EncFormer

Fully Homomorphic Encryption (FHE)

Secure Multiparty Computation (MPC)