🤖 AI Summary
To address the low inference efficiency and high communication overhead of large Transformer models under joint homomorphic encryption (HE) and secret sharing (SS) protection in sensitive domains such as healthcare and finance, this paper proposes the first fine-grained HE/SS collaborative computation framework. Our method introduces: (1) a piecewise differentiable nonlinear function approximation technique that significantly improves fitting accuracy for critical operators—including SoftMax, LayerNorm, and GeLU—under low-degree polynomial constraints; and (2) integrated optimizations for matrix multiplication and privacy-preserving module design. Experiments demonstrate that, compared to the S&P’24 BOLT system, our framework reduces inference latency by 54%–64% and communication volume by 72.2%, achieving both strong cryptographic privacy guarantees and practical-level performance.
📝 Abstract
Homomorphic encryption (HE) and secret sharing (SS) enable computations on encrypted data, providing significant privacy benefits for large transformer-based models (TBM) in sensitive sectors like medicine and finance. However, private TBM inference incurs significant costs due to the coarse-grained application of HE and SS. We present FASTLMPI, a new approach to accelerate private TBM inference through fine-grained computation optimization. Specifically, through the fine-grained co-design of homomorphic encryption and secret sharing, FASTLMPI achieves efficient protocols for matrix multiplication, SoftMax, LayerNorm, and GeLU. In addition, FASTLMPI introduces a precise segmented approximation technique for differentiable non-linear, improving its fitting accuracy while maintaining a low polynomial degree. Compared to solution BOLT (S&P'24), FASTLMPI shows a remarkable 54% to 64% decrease in runtime and an impressive 72.2% reduction in communication costs.