Prototypical Contrastive Learning For Improved Few-Shot Audio Classification

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Few-shot learning for audio classification remains underexplored, hindered by severe scarcity of labeled data. Method: We propose a novel framework integrating angle-aware supervised contrastive loss with prototypical networks, enhancing intra-class compactness and inter-class separability. To improve representation robustness, we incorporate SpecAugment-based spectral augmentation and self-attention mechanisms to generate multi-view fused embeddings. The entire model is trained end-to-end to maximize semantic consistency and generalization. Contribution/Results: Our approach achieves state-of-the-art performance on the MetaAudio benchmark under the 5-way 5-shot setting, significantly outperforming existing few-shot audio classification methods. It establishes a new paradigm for low-resource audio understanding by synergistically combining supervised contrastive learning, prototype-based inference, and robust feature augmentation.

Technology Category

Application Category

📝 Abstract

Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.

Problem

Research questions and friction points this paper is trying to address.

Improving few-shot audio classification performance

Integrating supervised contrastive loss with prototypical training

Addressing limited labeled data challenges in audio

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised contrastive loss integration

Angular loss enhancement technique

SpecAugment with self-attention embedding

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Audio Inference Engineer, Model Efficiency

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

Authors to Follow