TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently transferring domain-specific knowledge from a fine-tuned model to a new base architecture when the original proprietary training data is inaccessible. The authors propose an automated, data-free knowledge distillation method that identifies critical knowledge regions by analyzing discrepancies in perplexity between the fine-tuned and base models. Leveraging only a small set of representative prompts, the approach generates synthetic training data and combines iterative prompt expansion with parameter-efficient fine-tuning (e.g., LoRA) to effectively transfer knowledge. Notably, the method operates without any auxiliary discriminator and achieves state-of-the-art performance across multiple tasks, demonstrating high accuracy and flexibility in data-free knowledge transfer.

Technology Category

Application Category

📝 Abstract
To embed domain-specific or specialized knowledge into pre-trained foundation models, fine-tuning using techniques such as parameter efficient fine-tuning (e.g. LoRA) is a common practice. However, as new LLM architectures and pre-trained models emerge, transferring this specialized knowledge to newer models becomes an important task. In many scenarios, the original specialized data may be unavailable due to privacy or commercial restrictions, necessitating distillation and transfer of this specialized knowledge from the fine-tuned base model to a different pre-trained model. We present TuneShift-KD, a novel approach that automatically distills specialized knowledge from a fine-tuned model to a target model using only a few examples representative of the specialized information. Our key insight is that specialized knowledge can be identified through perplexity differences between base and fine-tuned models: prompts where the fine-tuned model responds confidently (low perplexity), but the base model struggles (high perplexity), indicate queries corresponding to the specialized knowledge learned by the fine-tuned model. TuneShift-KD leverages this insight to create a synthetic training dataset to transfer the specialized knowledge. Using an iterative process, TuneShift-KD generates more prompts similar to those that generated responses with specialized knowledge. TuneShift-KD does not require training discriminators or access to training datasets. It is an automated approach that only requires the initial fine-tuned and base models and a few representative prompts. Our experiments demonstrate that models fine-tuned using TuneShift-KD achieve higher accuracy than prior approaches, enabling ease of deployment and more effective transfer of the specialized knowledge.
Problem

Research questions and friction points this paper is trying to address.

knowledge distillation
model transfer
fine-tuned models
specialized knowledge
privacy constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge distillation
model transfer
perplexity-based selection
parameter-efficient fine-tuning
synthetic data generation
Y
Yushi Guan
Department of Computer Science, University of Toronto; Vector Institute, Toronto, Canada
J
Jeanine Ohene-Agyei
Department of Computer Science, University of Toronto
D
Daniel Kwan
Department of Computer Science, University of Toronto
J
Jean Sebastien Dandurand
Department of Computer Science, University of Toronto
Y
Yifei Zhang
Department of Computer Science, University of Toronto
Nandita Vijaykumar
Nandita Vijaykumar
Assistant Professor, University of Toronto
Computer Systems and Architecture