Private Fine-tuning of Large Language Models with Zeroth-order Optimization

📅 2024-01-09

🏛️ arXiv.org

📈 Citations: 11

✨ Influential: 1

career value

224K/year

🤖 AI Summary

To address the high memory overhead and low utility of DP-SGD in private fine-tuning of large language models (LLMs), this paper proposes DP-ZO—a novel framework that integrates zeroth-order (ZO) optimization with differential privacy (DP) for the first time. By eliminating explicit gradient computation, DP-ZO estimates directional updates via scalar function evaluations and injects Laplace noise only into scalar step sizes along random directions, achieving (ε,δ)-DP. This design circumvents the memory bottleneck of DP-SGD on ultra-large models, drastically reducing GPU memory consumption. Extensive experiments across multiple tasks and model scales demonstrate that DP-ZO matches DP-SGD’s utility under (ε,δ)-DP and surpasses it under pure ε-DP. The key innovation lies in leveraging ZO’s gradient-free property to enable privacy protection via scalar-level noise injection, thereby ensuring both scalability and practicality for private LLM adaptation.

Technology Category

Application Category

📝 Abstract

Differentially private stochastic gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner, but has proven difficult to scale to the era of foundation models. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods. A key insight into the design of our method is that the direction of the gradient in the zeroth-order optimization we use is random and the only information from training data is the step size, i.e., a scalar. Therefore, we only need to privatize the scalar step size, which is memory-efficient. DP-ZO provides a strong privacy-utility trade-off across different tasks, and model sizes that are comparable to DP-SGD in $(varepsilon,delta)$-DP. Notably, DP-ZO possesses significant advantages over DP-SGD in memory efficiency, and obtains higher utility in $varepsilon$-DP when using the Laplace mechanism.

Problem

Research questions and friction points this paper is trying to address.

Differential Privacy

Large Language Models

Performance Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

DP-ZO

Privacy-Preserving Fine-Tuning

Large Language Models

🔎 Similar Papers

Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning