🤖 AI Summary
Addressing two key challenges in knowledge distillation (KD)—high-dimensional optimization difficulty and lack of label-semantic supervision—this paper pioneers a conditional generative formulation of KD, proposing the Generative Distribution Distillation (GDD) framework. Methodologically, GDD introduces three core innovations: (1) a split-tokenization strategy enabling stable unsupervised distillation; (2) distribution contraction, a theoretically grounded technique proven equivalent to multi-task gradient proxy, which implicitly injects label supervision without explicit classification loss; and (3) support for efficient multi-step sampling during training. Under the ImageNet unsupervised setting, GDD surpasses the KL-divergence baseline by 16.29% in top-1 accuracy. With supervised training on ImageNet, ResNet-50 distilled via GDD achieves 82.28% top-1 accuracy—the highest reported among comparable KD methods.
📝 Abstract
In this paper, we formulate the knowledge distillation (KD) as a conditional generative problem and propose the extit{Generative Distribution Distillation (GenDD)} framework. A naive extit{GenDD} baseline encounters two major challenges: the curse of high-dimensional optimization and the lack of semantic supervision from labels. To address these issues, we introduce a extit{Split Tokenization} strategy, achieving stable and effective unsupervised KD. Additionally, we develop the extit{Distribution Contraction} technique to integrate label supervision into the reconstruction objective. Our theoretical proof demonstrates that extit{GenDD} with extit{Distribution Contraction} serves as a gradient-level surrogate for multi-task learning, realizing efficient supervised training without explicit classification loss on multi-step sampling image representations. To evaluate the effectiveness of our method, we conduct experiments on balanced, imbalanced, and unlabeled data. Experimental results show that extit{GenDD} performs competitively in the unsupervised setting, significantly surpassing KL baseline by extbf{16.29%} on ImageNet validation set. With label supervision, our ResNet-50 achieves extbf{82.28%} top-1 accuracy on ImageNet in 600 epochs training, establishing a new state-of-the-art.