Encryption-Friendly LLM Architecture

📅 2024-10-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prohibitively high computational overhead of homomorphic encryption (HE) for large language model (LLM) inference in privacy-sensitive settings, this paper proposes the first HE-friendly lightweight LLM architecture that integrates LoRA fine-tuning with Gaussian kernel approximation. The method introduces quantization-aware attention and feed-forward network (FFN) designs to enable end-to-end secure inference after private fine-tuning. Its core innovation lies in embedding LoRA adapters into a Gaussian kernel-approximated Transformer, drastically reducing polynomial evaluation complexity under HE. Experiments demonstrate a 6.94× speedup in fine-tuning, a 2.3× acceleration in HE-based inference, and negligible accuracy degradation (<1% relative to plaintext baselines). The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) offer personalized responses based on user interactions, but this use case raises serious privacy concerns. Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states and provides a potential solution for privacy-preserving machine learning (PPML). However, the computational intensity of transformers poses challenges for applying HE to LLMs. In this work, we propose a modified HE-friendly transformer architecture with an emphasis on inference following personalized (private) fine-tuning. Utilizing LoRA fine-tuning and Gaussian kernels, we achieve significant computational speedups -- 6.94x for fine-tuning and 2.3x for inference -- while maintaining performance comparable to plaintext models. Our findings provide a viable proof of concept for offering privacy-preserving LLM services in areas where data protection is crucial. Our code is available on GitHub.
Problem

Research questions and friction points this paper is trying to address.

Develop encryption-friendly LLM architecture
Address privacy in personalized LLM responses
Optimize transformers for homomorphic encryption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Homomorphic encryption for privacy
LoRA fine-tuning for efficiency
Gaussian kernels for speed
🔎 Similar Papers
No similar papers found.