Knowledge Injection via Prompt Distillation

📅 2024-12-19
🏛️ arXiv.org
📈 Citations: 5
Influential: 1
📄 PDF
🤖 AI Summary
Supervised fine-tuning (SFT) for knowledge updating in large language models (LLMs) suffers from poor generalization and fails to match the performance of retrieval-augmented generation (RAG). Method: This paper proposes a lightweight prompt distillation fine-tuning framework that implicitly embeds new knowledge into model weights via self-supervised knowledge transfer—distilling the teacher model’s output distribution under knowledge-enriched prompts into the student model’s parameters. It employs LoRA adapters to ensure architectural consistency between teacher and student, and leverages synthetically generated question-answer pairs alongside an output distribution matching loss. Contribution/Results: The method achieves RAG-level performance across diverse knowledge-updating benchmarks, while eliminating the need for external retrieval or complex prompt engineering during inference. This significantly improves deployment efficiency and response consistency, marking the first approach to achieve implicit, weight-based knowledge consolidation without explicit parameter updates or retrieval infrastructure.

Technology Category

Application Category

📝 Abstract
In many practical applications, large language models (LLMs) need to incorporate new knowledge not present in their pre-training data. The primary methods for this are fine-tuning and retrieval-augmented generation (RAG). Although RAG has emerged as the industry standard for knowledge injection, fine-tuning has not yet achieved comparable success. In this paper, we propose a new fine-tuning technique for learning new knowledge and show that it can reach the performance of RAG. The proposed method is based on the self-distillation approach, which we call prompt distillation. First, we generate question-answer pairs about the new knowledge. Then, we fine-tune a student model on the question-answer pairs to imitate the output distributions of a teacher model, which additionally receives the new knowledge in its prompt. The student model is identical to the teacher, except it is equipped with a LoRA adapter. This training procedure facilitates distilling the new knowledge from the teacher's prompt into the student's weights.
Problem

Research questions and friction points this paper is trying to address.

Inject new knowledge into LLMs efficiently
Improve fine-tuning for factual knowledge acquisition
Compare prompt distillation with RAG and fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distillation for knowledge injection
No need for teacher models
Outperforms fine-tuning and RAG
🔎 Similar Papers
No similar papers found.
K
Kalle Kujanpaa
Aalto University, Department of Computer Science
Harri Valpola
Harri Valpola
System 2 AI
Machine learning
Alexander Ilin
Alexander Ilin
Aalto University