How Class Ontology and Data Scale Affect Audio Transfer Learning

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates how the ontological structure, scale, and semantic similarity to downstream tasks of pretraining data influence transfer performance in audio transfer learning. By pretraining deep neural networks on subsets of the AudioSet ontology and fine-tuning them across three distinct tasks—acoustic scene classification, bird activity recognition, and spoken command recognition—the authors establish a multi-task evaluation framework. For the first time, they quantitatively disentangle the relative contributions of category count, sample size, and task similarity, demonstrating that all three factors positively affect transferability, with semantic similarity between source and target tasks exerting the dominant influence. These findings reveal a key principle governing effective knowledge transfer in audio representation learning.

Technology Category

Application Category

📝 Abstract
Transfer learning is a crucial concept within deep learning that allows artificial neural networks to benefit from a large pre-training data basis when confronted with a task of limited data. Despite its ubiquitous use and clear benefits, there are still many open questions regarding the inner workings of transfer learning and, in particular, regarding the understanding of when and how well it works. To that extent, we perform a rigorous study focusing on audio-to-audio transfer learning, in which we pre-train various model states on (ontology-based) subsets of AudioSet and fine-tune them on three computer audition tasks, namely acoustic scene recognition, bird activity recognition, and speech command recognition. We report that increasing the number of samples and classes in the pre-training data both have a positive impact on transfer learning. This is, however, generally surpassed by similarity between pre-training and the downstream task, which can lead the model to learn comparable features.
Problem

Research questions and friction points this paper is trying to address.

transfer learning
audio classification
class ontology
data scale
pre-training
Innovation

Methods, ideas, or system contributions that make the work stand out.

audio transfer learning
class ontology
data scale
task similarity
pre-training
🔎 Similar Papers
No similar papers found.
M
Manuel Milling
CHI – Chair of Health Informatics, Technical University of Munich, Munich, Germany
Andreas Triantafyllopoulos
Andreas Triantafyllopoulos
Technical University of Munich
machine learningaffective computingcomputer audition
A
Alexander Gebhard
CHI – Chair of Health Informatics, Technical University of Munich, Munich, Germany
S
Simon Rampp
CHI – Chair of Health Informatics, Technical University of Munich, Munich, Germany
B
Björn W. Schuller
CHI – Chair of Health Informatics, Technical University of Munich, Munich, Germany