Hybrid Policy Distillation for LLMs

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing knowledge distillation methods for large language models suffer from performance and stability bottlenecks due to the tight coupling among divergence direction, optimization strategy, and data mechanisms. This work proposes a unified perspective that reformulates distillation as a token-level reweighted log-likelihood objective. It introduces a hybrid KL divergence strategy that leverages the mode-covering property of forward KL and the mode-seeking behavior of reverse KL. Combined with off-policy data and a lightweight approximate online sampling mechanism, the approach enables highly efficient data utilization. The method significantly improves optimization stability, computational efficiency, and final performance across mathematical reasoning, dialogue, and code generation tasks, demonstrating broad applicability across diverse model architectures and scales.

Technology Category

Application Category

📝 Abstract

Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections between them, reformulating KD as a reweighted log-likelihood objective at the token level. We further propose Hybrid Policy Distillation (HPD), which integrates the complementary advantages of forward and reverse KL to balance mode coverage and mode-seeking, and combines off-policy data with lightweight, approximate on-policy sampling. We validate HPD on long-generation math reasoning as well as short-generation dialogue and code tasks, demonstrating improved optimization stability, computational efficiency, and final performance across diverse model families and scales. The code related to this work is available at https://github.com/zwhong714/Hybrid-Policy-Distillation.

Problem

Research questions and friction points this paper is trying to address.

Knowledge Distillation

Large Language Models

KL Divergence

Policy Distillation

Model Compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Policy Distillation

Knowledge Distillation

KL Divergence