Towards interpretable emotion recognition: Identifying key features with machine learning

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised speech pre-trained models (e.g., wav2vec 2.0, HuBERT) lack interpretability in emotion recognition, hindering their deployment in high-stakes, trust-critical domains such as clinical mental health assessment. Method: We propose a cross-contextual, robust interpretable feature identification framework that systematically analyzes intermediate representations of pre-trained models. Integrating gradient-weighted class activation mapping (Grad-CAM), feature attribution, and acoustic prior knowledge, the framework identifies key acoustic features—such as F0 dynamics, spectral tilt, and temporal energy envelope—that exhibit strong causal relevance to emotion discrimination and possess clear physical meaning. Contribution/Results: Extensive experiments across multilingual and noisy conditions demonstrate stable extraction of generalizable, interpretable feature subsets. The framework maintains state-of-the-art performance while significantly enhancing model transparency, providing a deployable interpretability foundation for trustworthy AI in clinical emotion evaluation and related high-assurance applications.

Technology Category

Application Category

📝 Abstract
Unsupervised methods, such as wav2vec2 and HuBERT, have achieved state-of-the-art performance in audio tasks, leading to a shift away from research on interpretable features. However, the lack of interpretability in these methods limits their applicability in critical domains like medicine, where understanding feature relevance is crucial. To better understand the features of unsupervised models, it remains critical to identify the interpretable features relevant to a given task. In this work, we focus on emotion recognition and use machine learning algorithms to identify and generalize the most important interpretable features for this task. While previous studies have explored feature relevance in emotion recognition, they are often constrained by narrow contexts and present inconsistent findings. Our approach aims to overcome these limitations, providing a broader and more robust framework for identifying the most important interpretable features.
Problem

Research questions and friction points this paper is trying to address.

Identifying interpretable features in emotion recognition models
Overcoming limitations of narrow contexts in feature relevance studies
Providing robust framework for key feature generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses wav2vec2 and HuBERT for emotion recognition
Identifies interpretable features with machine learning
Provides robust framework for feature relevance
🔎 Similar Papers