Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

📅 2024-12-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

To address the low inference efficiency and high communication overhead of large Transformer models under joint homomorphic encryption (HE) and secret sharing (SS) protection in sensitive domains such as healthcare and finance, this paper proposes the first fine-grained HE/SS collaborative computation framework. Our method introduces: (1) a piecewise differentiable nonlinear function approximation technique that significantly improves fitting accuracy for critical operators—including SoftMax, LayerNorm, and GeLU—under low-degree polynomial constraints; and (2) integrated optimizations for matrix multiplication and privacy-preserving module design. Experiments demonstrate that, compared to the S&P’24 BOLT system, our framework reduces inference latency by 54%–64% and communication volume by 72.2%, achieving both strong cryptographic privacy guarantees and practical-level performance.

Technology Category

Application Category

📝 Abstract

Homomorphic encryption (HE) and secret sharing (SS) enable computations on encrypted data, providing significant privacy benefits for large transformer-based models (TBM) in sensitive sectors like medicine and finance. However, private TBM inference incurs significant costs due to the coarse-grained application of HE and SS. We present FASTLMPI, a new approach to accelerate private TBM inference through fine-grained computation optimization. Specifically, through the fine-grained co-design of homomorphic encryption and secret sharing, FASTLMPI achieves efficient protocols for matrix multiplication, SoftMax, LayerNorm, and GeLU. In addition, FASTLMPI introduces a precise segmented approximation technique for differentiable non-linear, improving its fitting accuracy while maintaining a low polynomial degree. Compared to solution BOLT (S&P'24), FASTLMPI shows a remarkable 54% to 64% decrease in runtime and an impressive 72.2% reduction in communication costs.

Problem

Research questions and friction points this paper is trying to address.

Homomorphic Encryption

Secret Sharing

Large Model Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

FASTLMPI

Homomorphic Encryption

Efficient Data Processing

🔎 Similar Papers

Encryption-Friendly LLM Architecture