🤖 AI Summary
Supervised fine-tuning (SFT) often causes models to superficially imitate training data without grasping underlying reasoning logic—termed the “imitation problem.” To address this, we propose a multi-stage criticality-guided distillation framework that, for the first time, jointly models explanatory critique generation and response refinement as a ternary mapping task. We reinterpret knowledge distillation from an entropy-analytic perspective as Bayesian posterior updating, thereby mitigating format drift. Our method integrates large language model–driven critique generation, response optimization, ternary-supervised training, and entropy-driven uncertainty modeling. Evaluated on AMC23 (mathematical reasoning), our approach achieves a 17.5% absolute accuracy gain; on MMLU-Pro (comprehensive language understanding), it yields a 6.3% improvement. Crucially, it significantly enhances reasoning consistency and robustly suppresses output format deviation.
📝 Abstract
Supervised fine-tuning (SFT) using expert demonstrations often suffer from the imitation problem, where the model learns to reproduce the correct responses without emph{understanding} the underlying rationale. To address this limitation, we propose extsc{Critique-Guided Distillation (CGD)}, a novel multi-stage framework that integrates teacher model generated emph{explanatory critiques} and emph{refined responses} into the SFT process. A student model is then trained to map the triplet of prompt, teacher critique, and its own initial response to the corresponding refined teacher response, thereby learning both emph{what} to imitate and emph{why}. Using entropy-based analysis, we show that extsc{CGD} reduces refinement uncertainty and can be interpreted as a Bayesian posterior update. We perform extensive empirical evaluation of extsc{CGD}, on variety of benchmark tasks, and demonstrate significant gains on both math (AMC23 +17.5%) and language understanding tasks (MMLU-Pro +6.3%), while successfully mitigating the format drift issues observed in previous critique fine-tuning (CFT) techniques.