Intra-class patch swap for self-distillation

📅 2025-05-01

🏛️ Neurocomputing

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional knowledge distillation relies on pre-trained large teacher models, incurring substantial storage overhead, high training costs, and ambiguity in teacher selection; existing teacher-free distillation methods often require architectural modifications or complex training procedures. This paper proposes a lightweight, general-purpose self-distillation framework that employs only a single student network—introducing no auxiliary modules, structural changes, or additional learnable parameters. Its core innovation is an intra-class image patch swapping mechanism: under class-label guidance, random patches are cropped and exchanged across samples within the same class—first introducing intra-class local structural rearrangement into self-distillation to jointly optimize implicit knowledge transfer and feature disentanglement. Integrated with consistency regularization and feature-level self-distillation loss, our method improves ResNet-34 Top-1 accuracy by 1.8% on CIFAR-100 and 1.3% on ImageNet-1K, significantly outperforming standard self-distillation baselines.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Eliminates need for pre-trained teacher networks in knowledge distillation

Simplifies self-distillation without architectural changes or extra parameters

Improves model performance across multiple vision tasks efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-class patch swap augmentation for self-distillation

Single student network without auxiliary components

Model-agnostic and easy-to-implement augmentation function

🔎 Similar Papers

No similar papers found.

Authors to Follow