On-Policy Context Distillation for Language Models

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses the challenge of enabling language models to effectively internalize contextual knowledge without compromising generalization. The authors propose the On-Policy Context Distillation (OPCD) framework, which uniquely integrates on-policy distillation with context distillation: the student model is trained on its own generated trajectories and optimized by minimizing the reverse KL divergence relative to a context-conditional teacher model. This approach facilitates efficient internalization of both system prompts and experiential knowledge while enabling cross-size knowledge transfer. Experimental results demonstrate that OPCD significantly outperforms baseline methods across mathematical reasoning, text-based games, and domain-specific tasks, achieving higher task accuracy while better preserving out-of-distribution generalization—thereby allowing smaller models to effectively inherit experiential knowledge from larger ones.

Technology Category

Application Category

📝 Abstract

Context distillation enables language models to internalize in-context knowledge into their parameters. In our work, we propose On-Policy Context Distillation (OPCD), a framework that bridges on-policy distillation with context distillation by training a student model on its own generated trajectories while minimizing reverse Kullback-Leibler divergence against a context-conditioned teacher. We demonstrate the effectiveness of OPCD on two important applications: experiential knowledge distillation, where models extract and consolidate transferable knowledge from their historical solution traces, and system prompt distillation, where models internalize beneficial behaviors encoded in optimized prompts. Across mathematical reasoning, text-based games, and domain-specific tasks, OPCD consistently outperforms baseline methods, achieving higher task accuracy while better preserving out-of-distribution capabilities. We further show that OPCD enables effective cross-size distillation, where smaller student models can internalize experiential knowledge from larger teachers.

Problem

Research questions and friction points this paper is trying to address.

context distillation

language models

knowledge internalization

on-policy learning

prompt distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

On-Policy Distillation

Context Distillation

Reverse KL Divergence