If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the consistency with which different audio classification models accept minimal sufficient signals, thereby revealing heterogeneity in their information processing mechanisms. Introducing a "transferability" metric, the authors evaluate whether minimal sufficient audio samples generated by one model yield consistent classification outcomes when tested on other models, across three tasks: music genre classification, emotion recognition, and deepfake detection. The findings show a cross-model transferability rate of approximately 26% in music genre classification, while deepfake detection models exhibit pronounced divergence—particularly a subset termed “flat-earther” models. By integrating minimal sufficient signal extraction, cross-model evaluation, and information-theoretic analysis, this work offers the first characterization of model behavioral discrepancies from the perspective of informational consistency, uncovering differences obscured by conventional performance metrics.
📝 Abstract
In order to gain fresh insights about the information processing characteristics of different audio classification models, we propose transferability analysis. Given a minimal, sufficient signal for a classification on a model $f$, transferability analysis asks whether other models accept this minimal signal as having the same classification as it did on $f$. We define what it means for a sufficient signal to be transferable and perform a large study over $3$ different classification tasks: music genre, emotion recognition and deepfake detection. We find that transferability rates vary depending on the task, with sufficient signals for music genre being transferable $\approx26\%$ of the time. The other tasks reveal much higher variance in transferability and reveal that some models, in particular on deepfake detection, have different transferability behavior. We call these models `flat-earther' models. We investigate deepfake audio in more depth, and show that transferability analysis also allows to us to discover information theoretic differences between the models which are not captured by the more familiar metrics of accuracy and precision.
Problem

Research questions and friction points this paper is trying to address.

transferability
audio classification
sufficient signal
deepfake detection
model comparison
Innovation

Methods, ideas, or system contributions that make the work stand out.

transferability analysis
sufficient signal
audio classification
deepfake detection
information processing
🔎 Similar Papers