Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Supervised fine-tuning (SFT) often causes models to superficially imitate training data without grasping underlying reasoning logic—termed the “imitation problem.” To address this, we propose a multi-stage criticality-guided distillation framework that, for the first time, jointly models explanatory critique generation and response refinement as a ternary mapping task. We reinterpret knowledge distillation from an entropy-analytic perspective as Bayesian posterior updating, thereby mitigating format drift. Our method integrates large language model–driven critique generation, response optimization, ternary-supervised training, and entropy-driven uncertainty modeling. Evaluated on AMC23 (mathematical reasoning), our approach achieves a 17.5% absolute accuracy gain; on MMLU-Pro (comprehensive language understanding), it yields a 6.3% improvement. Crucially, it significantly enhances reasoning consistency and robustly suppresses output format deviation.

Technology Category

Application Category

📝 Abstract

Supervised fine-tuning (SFT) using expert demonstrations often suffer from the imitation problem, where the model learns to reproduce the correct responses without emph{understanding} the underlying rationale. To address this limitation, we propose extsc{Critique-Guided Distillation (CGD)}, a novel multi-stage framework that integrates teacher model generated emph{explanatory critiques} and emph{refined responses} into the SFT process. A student model is then trained to map the triplet of prompt, teacher critique, and its own initial response to the corresponding refined teacher response, thereby learning both emph{what} to imitate and emph{why}. Using entropy-based analysis, we show that extsc{CGD} reduces refinement uncertainty and can be interpreted as a Bayesian posterior update. We perform extensive empirical evaluation of extsc{CGD}, on variety of benchmark tasks, and demonstrate significant gains on both math (AMC23 +17.5%) and language understanding tasks (MMLU-Pro +6.3%), while successfully mitigating the format drift issues observed in previous critique fine-tuning (CFT) techniques.

Problem

Research questions and friction points this paper is trying to address.

Addresses imitation problem in supervised fine-tuning

Integrates teacher critiques for better rationale understanding

Reduces refinement uncertainty via Bayesian posterior update

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage framework integrating teacher critiques

Training with prompt, critique, and refined response

Reduces uncertainty via entropy-based analysis

🔎 Similar Papers

No similar papers found.