Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the poor adaptability to novel classes and weak transferability of Acoustic Scene Classification (ASC) models on edge devices, this paper proposes ContrastASC. Our method constructs a semantically preserved embedding space via supervised contrastive fine-tuning and introduces a novel contrastive representation distillation mechanism—enabling, for the first time, efficient transfer of structured semantic knowledge from a pre-trained teacher model to a lightweight student model. Unlike conventional fine-tuning and standard knowledge distillation, ContrastASC maintains high closed-set classification accuracy while significantly improving generalization to unseen classes under few-shot settings. Experiments across multiple ASC benchmarks demonstrate that ContrastASC achieves an effective trade-off between model compactness—ensuring edge deployability—and semantic extensibility. The framework consistently outperforms baselines in both accuracy and adaptability, validating its practicality for real-world edge ASC applications.

Technology Category

Application Category

📝 Abstract

Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.

Problem

Research questions and friction points this paper is trying to address.

Developing adaptable acoustic scene classification models for edge devices

Enabling transfer learning to unseen acoustic categories without retraining

Creating compact yet generalizable acoustic representations through distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive fine-tuning structures acoustic embedding space

Contrastive distillation transfers knowledge to compact models

Learned representations enable adaptation to unseen categories

🔎 Similar Papers

Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs