Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor adaptability to novel classes and weak transferability of Acoustic Scene Classification (ASC) models on edge devices, this paper proposes ContrastASC. Our method constructs a semantically preserved embedding space via supervised contrastive fine-tuning and introduces a novel contrastive representation distillation mechanism—enabling, for the first time, efficient transfer of structured semantic knowledge from a pre-trained teacher model to a lightweight student model. Unlike conventional fine-tuning and standard knowledge distillation, ContrastASC maintains high closed-set classification accuracy while significantly improving generalization to unseen classes under few-shot settings. Experiments across multiple ASC benchmarks demonstrate that ContrastASC achieves an effective trade-off between model compactness—ensuring edge deployability—and semantic extensibility. The framework consistently outperforms baselines in both accuracy and adaptability, validating its practicality for real-world edge ASC applications.

Technology Category

Application Category

📝 Abstract
Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.
Problem

Research questions and friction points this paper is trying to address.

Developing adaptable acoustic scene classification models for edge devices
Enabling transfer learning to unseen acoustic categories without retraining
Creating compact yet generalizable acoustic representations through distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive fine-tuning structures acoustic embedding space
Contrastive distillation transfers knowledge to compact models
Learned representations enable adaptation to unseen categories
🔎 Similar Papers
No similar papers found.
Kuang Yuan
Kuang Yuan
Carnegie Mellon University
Audio ProcessingAcousticsUbiquitous ComputingMobile Health
Y
Yang Gao
Meta Reality Labs, Redmond, WA, USA
X
Xilin Li
Meta Reality Labs, Redmond, WA, USA
Xinhao Mei
Xinhao Mei
Meta
audio signal processingmachine learningmachine listeningmultimodal learning
S
Syavosh Zadissa
Meta Reality Labs, Redmond, WA, USA
Tarun Pruthi
Tarun Pruthi
Meta
S
Saeed Bagheri Sereshki
Meta Reality Labs, Redmond, WA, USA