Breaking the Layer Barrier: Remodeling Private Transformer Inference with Hybrid CKKS and MPC

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address the low efficiency of synergistic homomorphic encryption (HE) and secure multi-party computation (MPC) in private Transformer inference—and particularly the high communication overhead incurred during cross-protocol conversions—this paper proposes BLB, a novel hybrid privacy-preserving inference framework. BLB introduces the first secure, low-overhead conversion protocol between the CKKS HE scheme and MPC. It decomposes Transformer layers at a fine-grained level and fuses linear operators to minimize interaction rounds and data transmission volume. Furthermore, it establishes a hybrid computation paradigm optimized for efficient encrypted matrix multiplication and Softmax evaluation. Experiments on BERT and GPT-2 demonstrate that BLB reduces communication overhead by 21× and latency by 13× compared to BOLT, and achieves 2× lower communication and 1.8× lower latency than Bumblebee. These results mark a significant advancement in the efficiency of privacy-preserving large-language-model inference.

Technology Category

Application Category

📝 Abstract

This paper presents an efficient framework for private Transformer inference that combines Homomorphic Encryption (HE) and Secure Multi-party Computation (MPC) to protect data privacy. Existing methods often leverage HE for linear layers (e.g., matrix multiplications) and MPC for non-linear layers (e.g., Softmax activation functions), but the conversion between HE and MPC introduces significant communication costs. The proposed framework, dubbed BLB, overcomes this by breaking down layers into fine-grained operators and further fusing adjacent linear operators, reducing the need for HE/MPC conversions. To manage the increased ciphertext bit width from the fused linear operators, BLB proposes the first secure conversion protocol between CKKS and MPC and enables CKKS-based computation of the fused operators. Additionally, BLB proposes an efficient matrix multiplication protocol for fused computation in Transformers. Extensive evaluations on BERT-base, BERT-large, and GPT2-base show that BLB achieves a $21 imes$ reduction in communication overhead compared to BOLT (S&P'24) and a $2 imes$ reduction compared to Bumblebee (NDSS'25), along with latency reductions of $13 imes$ and $1.8 imes$, respectively, when leveraging GPU acceleration.

Problem

Research questions and friction points this paper is trying to address.

Reducing communication costs in HE and MPC conversions

Managing ciphertext bit width from fused operators

Improving private Transformer inference efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CKKS and MPC integration

Fusing adjacent linear operators

Secure conversion protocol between CKKS and MPC

🔎 Similar Papers

No similar papers found.