🤖 AI Summary
To address the poor adaptability to novel classes and weak transferability of Acoustic Scene Classification (ASC) models on edge devices, this paper proposes ContrastASC. Our method constructs a semantically preserved embedding space via supervised contrastive fine-tuning and introduces a novel contrastive representation distillation mechanism—enabling, for the first time, efficient transfer of structured semantic knowledge from a pre-trained teacher model to a lightweight student model. Unlike conventional fine-tuning and standard knowledge distillation, ContrastASC maintains high closed-set classification accuracy while significantly improving generalization to unseen classes under few-shot settings. Experiments across multiple ASC benchmarks demonstrate that ContrastASC achieves an effective trade-off between model compactness—ensuring edge deployability—and semantic extensibility. The framework consistently outperforms baselines in both accuracy and adaptability, validating its practicality for real-world edge ASC applications.
📝 Abstract
Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.