Knowledge Injection via Prompt Distillation

📅 2024-12-19

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 1

career value

157K/year

🤖 AI Summary

Supervised fine-tuning (SFT) for knowledge updating in large language models (LLMs) suffers from poor generalization and fails to match the performance of retrieval-augmented generation (RAG). Method: This paper proposes a lightweight prompt distillation fine-tuning framework that implicitly embeds new knowledge into model weights via self-supervised knowledge transfer—distilling the teacher model’s output distribution under knowledge-enriched prompts into the student model’s parameters. It employs LoRA adapters to ensure architectural consistency between teacher and student, and leverages synthetically generated question-answer pairs alongside an output distribution matching loss. Contribution/Results: The method achieves RAG-level performance across diverse knowledge-updating benchmarks, while eliminating the need for external retrieval or complex prompt engineering during inference. This significantly improves deployment efficiency and response consistency, marking the first approach to achieve implicit, weight-based knowledge consolidation without explicit parameter updates or retrieval infrastructure.

Technology Category

Application Category

📝 Abstract

In many practical applications, large language models (LLMs) need to incorporate new knowledge not present in their pre-training data. The primary methods for this are fine-tuning and retrieval-augmented generation (RAG). Although RAG has emerged as the industry standard for knowledge injection, fine-tuning has not yet achieved comparable success. In this paper, we propose a new fine-tuning technique for learning new knowledge and show that it can reach the performance of RAG. The proposed method is based on the self-distillation approach, which we call prompt distillation. First, we generate question-answer pairs about the new knowledge. Then, we fine-tune a student model on the question-answer pairs to imitate the output distributions of a teacher model, which additionally receives the new knowledge in its prompt. The student model is identical to the teacher, except it is equipped with a LoRA adapter. This training procedure facilitates distilling the new knowledge from the teacher's prompt into the student's weights.

Problem

Research questions and friction points this paper is trying to address.

Inject new knowledge into LLMs efficiently

Improve fine-tuning for factual knowledge acquisition

Compare prompt distillation with RAG and fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distillation for knowledge injection

No need for teacher models

Outperforms fine-tuning and RAG

🔎 Similar Papers

No similar papers found.