ReDistill: Residual Encoded Distillation for Peak Memory Reduction

๐Ÿ“… 2024-06-06
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address excessive peak memory consumption during high-resolution inference of large models on edge devices, this paper proposes a residual encoding distillation framework. Under the teacherโ€“student paradigm, it aggressively downsamples student feature maps via pooling and introduces a novel residual encoder to explicitly model and reconstruct structured details lost during downsampling. Furthermore, it integrates multi-scale feature alignment with a diffusion-adapted denoising distillation strategy to jointly supervise feature representations and output responses. Experiments demonstrate that, for image classification, peak memory is reduced by 4โ€“5ร— with significantly smaller accuracy degradation than baseline methods; for diffusion-based generative models, peak memory drops to 25% while preserving image diversity and fidelity. This work is the first to systematically incorporate residual encoding into knowledge distillation, achieving an effective trade-off between memory compression efficiency and model performance stability.

Technology Category

Application Category

๐Ÿ“ Abstract
The expansion of neural network sizes and the enhanced resolution of modern image sensors result in heightened memory and power demands to process modern computer vision models. In order to deploy these models in extremely resource-constrained edge devices, it is crucial to reduce their peak memory, which is the maximum memory consumed during the execution of a model. A naive approach to reducing peak memory is aggressive down-sampling of feature maps via pooling with large stride, which often results in unacceptable degradation in network performance. To mitigate this problem, we propose residual encoded distillation (ReDistill) for peak memory reduction in a teacher-student framework, in which a student network with less memory is derived from the teacher network using aggressive pooling. We apply our distillation method to multiple problems in computer vision, including image classification and diffusion-based image generation. For image classification, our method yields 4x-5x theoretical peak memory reduction with less degradation in accuracy for most CNN-based architectures. For diffusion-based image generation, our proposed distillation method yields a denoising network with 4x lower theoretical peak memory while maintaining decent diversity and fidelity for image generation. Experiments demonstrate our method's superior performance compared to other feature-based and response-based distillation methods when applied to the same student network. The code is available at https://github.com/mengtang-lab/ReDistill.
Problem

Research questions and friction points this paper is trying to address.

Reduces peak memory in CNNs for edge devices
Mitigates performance loss from aggressive pooling
Enhances image classification and diffusion generation efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual encoded distillation reduces peak memory
Teacher-student framework with aggressive pooling
Maintains accuracy and diversity in vision tasks
๐Ÿ”Ž Similar Papers
No similar papers found.