🤖 AI Summary
Efficiently computing the softmax function in Transformers under homomorphic encryption is highly challenging due to the large dynamic range and high multiplicative depth induced by exponentiation and division operations. This work proposes MGF-softmax, the first approach to reformulate softmax using the moment generating function (MGF), which significantly reduces multiplicative depth by approximating the denominator while preserving the core properties of softmax. The approximation asymptotically converges to the original softmax function for long sequences. By breaking the traditional trade-off between computational depth and accuracy, MGF-softmax achieves inference accuracy comparable to high-depth exact implementations while substantially lowering computational overhead, as demonstrated in experiments on Vision Transformers and large language models.
📝 Abstract
Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of transformer architectures, remains particularly challenging in HE due to its multivariate structure, the large dynamic range induced by exponential functions, and the need for accurate division during normalization. In this paper, we propose MGF-softmax, a novel softmax reformulation based on the moment generating function (MGF) that replaces the softmax denominator with its moment-based counterpart. This reformulation substantially reduces multiplicative depth while preserving key properties of softmax and asymptotically converging to the exact softmax as the number of input tokens increases. Extensive experiments on Vision Transformers and large language models show that MGF-softmax provides an efficient and accurate approximation of softmax in encrypted inference. In particular, it achieves inference accuracy close to that of high-depth exact methods, while requiring substantially lower computational cost through reduced multiplicative depth.