Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study investigates how key attributes of teacher models in knowledge distillation enhance the performance and cross-device generalization of low-complexity student models for acoustic scene classification (ASC). Through systematic ablation studies, we identify— for the first time—the four decisive factors: teacher model scale, architectural design, device-robust training (e.g., domain adaptation and device-aware data augmentation), and dynamic weighted multi-teacher ensembling. Based on these findings, we propose a reusable set of teacher model design principles. Evaluated on the DCASE benchmark, our approach improves student model accuracy by +3.2% and boosts cross-device robustness—reducing relative classification error by 27%. This work establishes the first principled framework for teacher model design in ASC, addressing a critical gap in the literature.

Technology Category

Application Category

📝 Abstract

Knowledge Distillation (KD) is a widespread technique for compressing the knowledge of large models into more compact and efficient models. KD has proved to be highly effective in building well-performing low-complexity Acoustic Scene Classification (ASC) systems and was used in all the top-ranked submissions to this task of the annual DCASE challenge in the past three years. There is extensive research available on establishing the KD process, designing efficient student models, and forming well-performing teacher ensembles. However, less research has been conducted on investigating which teacher model attributes are beneficial for low-complexity students. In this work, we try to close this gap by studying the effects on the student's performance when using different teacher network architectures, varying the teacher model size, training them with different device generalization methods, and applying different ensembling strategies. The results show that teacher model sizes, device generalization methods, the ensembling strategy and the ensemble size are key factors for a well-performing student network.

Problem

Research questions and friction points this paper is trying to address.

Investigates teacher model attributes for effective knowledge distillation.

Explores impact of teacher architecture, size, and training methods on students.

Identifies key factors for optimizing student network performance in ASC.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Study teacher model attributes for student performance

Vary teacher size and device generalization methods

Apply different ensembling strategies for optimization

🔎 Similar Papers

No similar papers found.