🤖 AI Summary
Traditional knowledge distillation relies on pre-trained large teacher models, incurring substantial storage overhead, high training costs, and ambiguity in teacher selection; existing teacher-free distillation methods often require architectural modifications or complex training procedures. This paper proposes a lightweight, general-purpose self-distillation framework that employs only a single student network—introducing no auxiliary modules, structural changes, or additional learnable parameters. Its core innovation is an intra-class image patch swapping mechanism: under class-label guidance, random patches are cropped and exchanged across samples within the same class—first introducing intra-class local structural rearrangement into self-distillation to jointly optimize implicit knowledge transfer and feature disentanglement. Integrated with consistency regularization and feature-level self-distillation loss, our method improves ResNet-34 Top-1 accuracy by 1.8% on CIFAR-100 and 1.3% on ImageNet-1K, significantly outperforming standard self-distillation baselines.