Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For low-complexity audio scene classification, this paper proposes a two-stage joint knowledge distillation framework that simultaneously transfers soft label distributions and intermediate-layer feature representations from multiple teachers—PaSST and CP-ResNet—enabling synergistic transfer of semantic knowledge and structural information. Unlike conventional single-stage distillation, our approach enhances the representational capacity of the lightweight student model (CP-Mobile) via soft label averaging and hierarchical feature alignment. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile dataset, the method achieves a classification accuracy of 59.30% with significantly reduced computational cost. This demonstrates its effectiveness and robustness for acoustic scene recognition on resource-constrained edge devices.

Technology Category

Application Category

📝 Abstract
This report presents a dual-level knowledge distillation framework with multi-teacher guidance for low-complexity acoustic scene classification (ASC) in DCASE2025 Task 1. We propose a distillation strategy that jointly transfers both soft logits and intermediate feature representations. Specifically, we pre-trained PaSST and CP-ResNet models as teacher models. Logits from teachers are averaged to generate soft targets, while one CP-ResNet is selected for feature-level distillation. This enables the compact student model (CP-Mobile) to capture both semantic distribution and structural information from teacher guidance. Experiments on the TAU Urban Acoustic Scenes 2022 Mobile dataset (development set) demonstrate that our submitted systems achieve up to 59.30% accuracy.
Problem

Research questions and friction points this paper is trying to address.

Develop low-complexity acoustic scene classification
Jointly distill soft logits and feature representations
Improve accuracy with multi-teacher guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-level knowledge distillation framework
Joint soft logits and feature distillation
Multi-teacher guidance for compact model
🔎 Similar Papers
No similar papers found.
H
Haowen Li
Smart Nation TRANS Lab, Nanyang Technological University, Singapore
Z
Ziyi Yang
Smart Nation TRANS Lab, Nanyang Technological University, Singapore
Mou Wang
Mou Wang
Institute of Acoustics, Chinese Academy of Sciences
Machine Learning
E
Ee-Leng Tan
Smart Nation TRANS Lab, Nanyang Technological University, Singapore
J
Junwei Yeow
Smart Nation TRANS Lab, Nanyang Technological University, Singapore
S
Santi Peksi
Smart Nation TRANS Lab, Nanyang Technological University, Singapore
Woon-Seng Gan
Woon-Seng Gan
Professor of Audio Engineering and Director of Smart Nation Lab @ Nanyang Technological University,
Active Noise ControlMachine & Deep LearningSpatial AudioPerceptual Evaluation