The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the lack of contrastive explanations in speech-to-text (S2T) generation models. We propose the first contrastive explanation method tailored to S2T tasks, leveraging feature attribution to quantify how local regions of the input spectrogram contribute to selecting a target output—relative to specific contrastive alternatives (e.g., alternative words or gender labels)—thereby revealing the key acoustic cues underlying the model’s preference for one generation over others. Unlike conventional attribution methods that explain *why a given output was produced*, our approach explicitly answers *why this output was chosen over competing candidates*, filling a critical gap in contrastive interpretability for generative speech models. Evaluated on a speech translation gender assignment task, our method accurately identifies time-frequency regions most influential for gender prediction, demonstrating high explanation fidelity and practical interpretability.

Technology Category

Application Category

📝 Abstract

Contrastive explanations, which indicate why an AI system produced one output (the target) instead of another (the foil), are widely regarded in explainable AI as more informative and interpretable than standard explanations. However, obtaining such explanations for speech-to-text (S2T) generative models remains an open challenge. Drawing from feature attribution techniques, we propose the first method to obtain contrastive explanations in S2T by analyzing how parts of the input spectrogram influence the choice between alternative outputs. Through a case study on gender assignment in speech translation, we show that our method accurately identifies the audio features that drive the selection of one gender over another. By extending the scope of contrastive explanations to S2T, our work provides a foundation for better understanding S2T models.

Problem

Research questions and friction points this paper is trying to address.

Generating contrastive explanations for speech-to-text models

Identifying audio features influencing alternative output choices

Analyzing spectrogram impacts on gender assignment in translation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes contrastive explanations for speech-to-text models

Analyzes input spectrogram influence on output choices

Identifies audio features driving gender selection in translation

🔎 Similar Papers

FaithLM: Towards Faithful Explanations for Large Language Models

2024-02-07Citations: 3

💼 Related Jobs

Machine Learning Engineer, Siri Speech

Apple

Seattle, United States of America

Authors to Follow