Hybrid Policy Distillation for LLMs

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
Existing knowledge distillation methods for large language models suffer from performance and stability bottlenecks due to the tight coupling among divergence direction, optimization strategy, and data mechanisms. This work proposes a unified perspective that reformulates distillation as a token-level reweighted log-likelihood objective. It introduces a hybrid KL divergence strategy that leverages the mode-covering property of forward KL and the mode-seeking behavior of reverse KL. Combined with off-policy data and a lightweight approximate online sampling mechanism, the approach enables highly efficient data utilization. The method significantly improves optimization stability, computational efficiency, and final performance across mathematical reasoning, dialogue, and code generation tasks, demonstrating broad applicability across diverse model architectures and scales.

Technology Category

Application Category

📝 Abstract
Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections between them, reformulating KD as a reweighted log-likelihood objective at the token level. We further propose Hybrid Policy Distillation (HPD), which integrates the complementary advantages of forward and reverse KL to balance mode coverage and mode-seeking, and combines off-policy data with lightweight, approximate on-policy sampling. We validate HPD on long-generation math reasoning as well as short-generation dialogue and code tasks, demonstrating improved optimization stability, computational efficiency, and final performance across diverse model families and scales. The code related to this work is available at https://github.com/zwhong714/Hybrid-Policy-Distillation.
Problem

Research questions and friction points this paper is trying to address.

Knowledge Distillation
Large Language Models
KL Divergence
Policy Distillation
Model Compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Policy Distillation
Knowledge Distillation
KL Divergence
On-policy Sampling
Token-level Reweighting
🔎 Similar Papers
No similar papers found.