Knowledge Distillation with Adapted Weight

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low efficiency, poor stability, and weak interpretability of knowledge distillation (KD) for deploying large language models on resource-constrained devices (e.g., smartphones), this paper proposes KD-AIF, an adaptive data-weighting KD framework grounded in influence functions. It is the first work to incorporate influence functions—originally from robust statistics—into KD, enabling dynamic, principle-driven sample weighting based on the SAFE criteria (Sustainability, Accuracy, Fairness, Explainability). KD-AIF integrates a teacher-student architecture, adaptive weight updating, and semi-supervised collaborative optimization. Evaluated on CIFAR-100, CIFAR-10-4k, SVHN-1k, and GLUE benchmarks, KD-AIF consistently outperforms state-of-the-art methods, achieving superior generalization and model interpretability while maintaining high accuracy.

Technology Category

Application Category

📝 Abstract
Although large models have shown a strong capacity to solve large-scale problems in many areas including natural language and computer vision, their voluminous parameters are hard to deploy in a real-time system due to computational and energy constraints. Addressing this, knowledge distillation through Teacher-Student architecture offers a sustainable pathway to compress the knowledge of large models into more manageable sizes without significantly compromising performance. To enhance the robustness and interpretability of this framework, it is critical to understand how individual training data impact model performance, which is an area that remains underexplored. We propose the extbf{Knowledge Distillation with Adaptive Influence Weight (KD-AIF)} framework which leverages influence functions from robust statistics to assign weights to training data, grounded in the four key SAFE principles: Sustainability, Accuracy, Fairness, and Explainability. This novel approach not only optimizes distillation but also increases transparency by revealing the significance of different data. The exploration of various update mechanisms within the KD-AIF framework further elucidates its potential to significantly improve learning efficiency and generalization in student models, marking a step toward more explainable and deployable Large Models. KD-AIF is effective in knowledge distillation while also showing exceptional performance in semi-supervised learning with outperforms existing baselines and methods in multiple benchmarks (CIFAR-100, CIFAR-10-4k, SVHN-1k, and GLUE).
Problem

Research questions and friction points this paper is trying to address.

Efficient Learning
Model Performance
Resource-constrained Environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Distillation
Adaptive Importance Weighting
Semi-supervised Learning
🔎 Similar Papers
No similar papers found.
S
Sirong Wu
Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Zhuhai 519087, China; BNU-HKBU United International College, Zhuhai 519087, China; Faculty of Science, Hong Kong Baptist University, Hong Kong SAR 999077, China
Xi Luo
Xi Luo
Intel Corporation
High Performance Computing
J
Junjie Liu
Department of Political Science, Trinity College Dublin, 2 Clare Street, Dublin 2, Ireland
Yuhui Deng
Yuhui Deng
Professor of Computer Science, Jinan University
Cloud ComputingInformation StorageData ManagementComputer System