Prototypical Contrastive Learning For Improved Few-Shot Audio Classification

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Few-shot learning for audio classification remains underexplored, hindered by severe scarcity of labeled data. Method: We propose a novel framework integrating angle-aware supervised contrastive loss with prototypical networks, enhancing intra-class compactness and inter-class separability. To improve representation robustness, we incorporate SpecAugment-based spectral augmentation and self-attention mechanisms to generate multi-view fused embeddings. The entire model is trained end-to-end to maximize semantic consistency and generalization. Contribution/Results: Our approach achieves state-of-the-art performance on the MetaAudio benchmark under the 5-way 5-shot setting, significantly outperforming existing few-shot audio classification methods. It establishes a new paradigm for low-resource audio understanding by synergistically combining supervised contrastive learning, prototype-based inference, and robust feature augmentation.

Technology Category

Application Category

📝 Abstract
Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.
Problem

Research questions and friction points this paper is trying to address.

Improving few-shot audio classification performance
Integrating supervised contrastive loss with prototypical training
Addressing limited labeled data challenges in audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised contrastive loss integration
Angular loss enhancement technique
SpecAugment with self-attention embedding
🔎 Similar Papers
No similar papers found.
C
Christos Sgouropoulos
Multimedia Analysis Group of the Computational Intelligence Laboratory (MagCIL), Institute of Informatics and Telecommunications, NCSR "DEMOKRITOS"
C
Christos Nikou
Multimedia Analysis Group of the Computational Intelligence Laboratory (MagCIL), Institute of Informatics and Telecommunications, NCSR "DEMOKRITOS"
S
Stefanos Vlachos
Multimedia Analysis Group of the Computational Intelligence Laboratory (MagCIL), Institute of Informatics and Telecommunications, NCSR "DEMOKRITOS"
V
Vasileios Theiou
Multimedia Analysis Group of the Computational Intelligence Laboratory (MagCIL), Institute of Informatics and Telecommunications, NCSR "DEMOKRITOS"
C
Christos Foukanelis
Multimedia Analysis Group of the Computational Intelligence Laboratory (MagCIL), Institute of Informatics and Telecommunications, NCSR "DEMOKRITOS"
Theodoros Giannakopoulos
Theodoros Giannakopoulos
Principal Researcher NCSR Demokritos / Director of Machine Learning at Behavioral Signals
pattern recognitionmultimedia signal analysisaudio analysis