Model-Based Privacy-Preserving Knowledge Transfer for Large Language Models

📅 2024-10-14

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the privacy–utility trade-off in domain knowledge transfer for large language models (LLMs), this paper proposes Llamdex—a model-level knowledge transfer framework that operates without access to original sensitive data. Methodologically, Llamdex departs from data-dependent paradigms such as retrieval-augmented generation (RAG) or synthetic data generation, instead employing differentially private model distillation coupled with a lightweight, plug-and-play connector module to enable collaborative inference between a base LLM and a compact domain-specific model. Domain knowledge is injected under strict differential privacy budgets, eliminating risks of data exposure and utility degradation. Experiments demonstrate that, under equivalent privacy guarantees, Llamdex achieves up to a 26% improvement in domain task accuracy over the best-performing private synthetic data approach, while maintaining inference latency nearly identical to that of the base model.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) become more prevalent, effectively utilizing domain-specific knowledge while ensuring privacy has become critical. Existing methods often struggle to balance utility and privacy. For instance, retrieval-augmented generation (RAG) enables LLMs to access domain-specific knowledge but compromises the privacy of sensitive data. On the other hand, differentially private data synthesis techniques offer strong privacy guarantees but often result in poor utility. To address this challenge, we propose Llamdex, a novel framework that enhances LLMs using only models trained on domain-specific data, integrated into LLMs through carefully designed connection modules. Our approach significantly enhances the accuracy of domain-specific tasks, achieving up to a 26% accuracy improvement compared to state-of-the-art data synthesis methods under the same differential privacy constraints. Experimental results show that Llamdex not only improves the accuracy of LLM responses but also maintains comparable inference efficiency to the original LLM, highlighting its potential for real applications.

Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs with domain-specific knowledge

Ensure privacy in knowledge transfer

Balance utility and privacy effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based privacy-preserving transfer

Integration through connection modules

Enhances accuracy with privacy constraints

🔎 Similar Papers

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration