Encryption-Friendly LLM Architecture

📅 2024-10-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address the prohibitively high computational overhead of homomorphic encryption (HE) for large language model (LLM) inference in privacy-sensitive settings, this paper proposes the first HE-friendly lightweight LLM architecture that integrates LoRA fine-tuning with Gaussian kernel approximation. The method introduces quantization-aware attention and feed-forward network (FFN) designs to enable end-to-end secure inference after private fine-tuning. Its core innovation lies in embedding LoRA adapters into a Gaussian kernel-approximated Transformer, drastically reducing polynomial evaluation complexity under HE. Experiments demonstrate a 6.94× speedup in fine-tuning, a 2.3× acceleration in HE-based inference, and negligible accuracy degradation (<1% relative to plaintext baselines). The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) offer personalized responses based on user interactions, but this use case raises serious privacy concerns. Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states and provides a potential solution for privacy-preserving machine learning (PPML). However, the computational intensity of transformers poses challenges for applying HE to LLMs. In this work, we propose a modified HE-friendly transformer architecture with an emphasis on inference following personalized (private) fine-tuning. Utilizing LoRA fine-tuning and Gaussian kernels, we achieve significant computational speedups -- 6.94x for fine-tuning and 2.3x for inference -- while maintaining performance comparable to plaintext models. Our findings provide a viable proof of concept for offering privacy-preserving LLM services in areas where data protection is crucial. Our code is available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

Develop encryption-friendly LLM architecture

Address privacy in personalized LLM responses

Optimize transformers for homomorphic encryption

Innovation

Methods, ideas, or system contributions that make the work stand out.

Homomorphic encryption for privacy

LoRA fine-tuning for efficiency

Gaussian kernels for speed

🔎 Similar Papers

Confidential Prompting: Protecting User Prompts from Cloud LLM Providers