Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs

📅 2025-06-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Soft prompts for large language models (LLMs) suffer from three key limitations: poor cross-model generalization, reliance on original private training data during optimization—posing privacy risks—and high computational overhead. To address these challenges, we propose the first soft prompt framework supporting both localized training and cross-model transfer. Our approach comprises three core components: (1) a knowledge distillation–based small-model proxy to eliminate direct access to sensitive source data; (2) integration of differential privacy guarantees to ensure secure local soft prompt optimization; and (3) a lightweight cross-model prompt transfer strategy enabling efficient, low-fidelity-loss deployment of soft prompts across diverse LLMs. Extensive experiments demonstrate that our method matches end-to-end fine-tuning performance across multiple tasks, substantially reduces computational cost, prevents any data egress, and simultaneously achieves strong trade-offs among accuracy, privacy preservation, and cross-model generalizability.

Technology Category

Application Category

📝 Abstract
Prompting has become a dominant paradigm for adapting large language models (LLMs). While discrete (textual) prompts are widely used for their interpretability, soft (parameter) prompts have recently gained traction in APIs. This is because they can encode information from more training samples while minimizing the user's token usage, leaving more space in the context window for task-specific input. However, soft prompts are tightly coupled to the LLM they are tuned on, limiting their generalization to other LLMs. This constraint is particularly problematic for efficiency and privacy: (1) tuning prompts on each LLM incurs high computational costs, especially as LLMs continue to grow in size. Additionally, (2) when the LLM is hosted externally, soft prompt tuning often requires sharing private data with the LLM provider. For instance, this is the case with the NVIDIA NeMo API. To address these issues, we propose POST (Privacy Of Soft prompt Transfer), a framework that enables private tuning of soft prompts on a small model and subsequently transfers these prompts to a larger LLM. POST uses knowledge distillation to derive a small model directly from the large LLM to improve prompt transferability, tunes the soft prompt locally, optionally with differential privacy guarantees, and transfers it back to the larger LLM using a small public dataset. Our experiments show that POST reduces computational costs, preserves privacy, and effectively transfers high-utility soft prompts.
Problem

Research questions and friction points this paper is trying to address.

Soft prompts lack generalization across different large language models.
High computational costs from tuning prompts on each LLM.
Privacy risks when sharing data for external soft prompt tuning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge distillation for small model derivation
Local soft prompt tuning with privacy
Public dataset for prompt transfer