DP-SelFT: Differentially Private Selective Fine-Tuning for Large Language Models

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

This work addresses the significant utility degradation commonly observed in large language models when fine-tuned under differential privacy (DP), primarily due to gradient clipping and noise injection. To mitigate this issue, the authors propose DP-SelFT, a novel framework that integrates selective fine-tuning with DP mechanisms. DP-SelFT leverages lightweight DP-synthesized data—incurred without additional privacy cost—to identify a subset of parameters robust to downstream tasks and aligns the perturbation mechanism in DP training for targeted updates. This approach substantially outperforms existing DP fine-tuning methods while preserving rigorous privacy guarantees, achieving a markedly improved privacy-utility trade-off across multiple benchmark tasks.

📝 Abstract

Large language models (LLMs) are commonly adapted to downstream tasks through fine-tuning, but fine-tuning data often contains sensitive information that may be leaked by the resulting model. Differential privacy (DP) offers formal protection against such leakage, yet DP fine-tuning of LLMs still suffers from substantial utility degradation due to gradient clipping and noise injection. Existing work improves this trade-off by combining DP with parameter-efficient fine-tuning methods such as LoRA, which constrain the form of updates. In this work, we study a complementary direction: selective fine-tuning, which constrains where updates are applied. We propose DP-SelFT, a framework for differentially private selective fine-tuning of LLMs. DP-SelFT addresses three DP-specific challenges in parameter selection: avoiding repeated privacy cost, improving stability under noisy estimates, and selecting parameters that remain useful under clipped and noisy updates. It first constructs a lightweight DP synthetic dataset and performs selection only on this synthetic data, so the selection stage incurs no additional privacy cost. It then conducts layer-level selection by temporarily training candidate layer subsets on a synthetic training split and evaluating them on a synthetic validation split. Crucially, this temporary training is performed under a perturbation regime matched to downstream DP fine-tuning, with worst-case perturbations of the same scale as DP noise. This favors layer subsets that are not only learnable but also robust to noisy private updates. Experiments on benchmark tasks show that DP-SelFT consistently improves the privacy--utility trade-off over existing DP fine-tuning baselines under the same privacy guarantees.

Problem

Research questions and friction points this paper is trying to address.

differential privacy

large language models

fine-tuning

privacy-utility trade-off

selective fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Privacy

Selective Fine-Tuning

Large Language Models