ERM-MinMaxGAP: Benchmarking and Mitigating Gender Bias in Multilingual Multimodal Speech-LLM Emotion Recognition

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the significant gender bias present in multilingual, multimodal large speech models for emotion recognition and the lack of clarity regarding their fairness across languages and modalities. To tackle this, the authors construct a multilingual (English, Japanese, German) multimodal benchmark based on MELD-ST and propose ERM-MinMaxGAP, a novel approach that jointly optimizes emotion recognition performance and gender fairness. The method integrates empirical risk minimization with a MinMaxGAP regularization term and incorporates an adaptive fairness weighting mechanism. Experimental results demonstrate that the proposed approach improves emotion recognition accuracy by 5.5% and 5.0% in unimodal and multimodal settings, respectively, while simultaneously reducing gender bias gaps by 0.1% and 1.4%.

Technology Category

Application Category

📝 Abstract
Speech emotion recognition (SER) systems can exhibit gender-related performance disparities, but how such bias manifests in multilingual speech LLMs across languages and modalities is unclear. We introduce a novel multilingual, multimodal benchmark built on MELD-ST, spanning English, Japanese, and German, to quantify language-specific SER performance and gender gaps. We find bias is strongly language-dependent, and multimodal fusion does not reliably improve fairness. To address these, we propose ERM-MinMaxGAP, a fairness-informed training objective, which augments empirical risk minimization (ERM) with a proposed adaptive fairness weight mechanism and a novel MinMaxGAP regularizer on the maximum male-female loss gap within each language and modality. Building upon the Qwen2-Audio backbone, our ERM-MinMaxGAP approach improves multilingual SER performance by 5.5% and 5.0% while reducing the overall gender bias gap by 0.1% and 1.4% in the unimodal and multimodal settings, respectively.
Problem

Research questions and friction points this paper is trying to address.

gender bias
speech emotion recognition
multilingual
multimodal
fairness
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual speech emotion recognition
gender bias mitigation
MinMaxGAP regularizer
fairness-aware training
multimodal fusion
🔎 Similar Papers
No similar papers found.