ERM-MinMaxGAP: Benchmarking and Mitigating Gender Bias in Multilingual Multimodal Speech-LLM Emotion Recognition

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study addresses the significant gender bias present in multilingual, multimodal large speech models for emotion recognition and the lack of clarity regarding their fairness across languages and modalities. To tackle this, the authors construct a multilingual (English, Japanese, German) multimodal benchmark based on MELD-ST and propose ERM-MinMaxGAP, a novel approach that jointly optimizes emotion recognition performance and gender fairness. The method integrates empirical risk minimization with a MinMaxGAP regularization term and incorporates an adaptive fairness weighting mechanism. Experimental results demonstrate that the proposed approach improves emotion recognition accuracy by 5.5% and 5.0% in unimodal and multimodal settings, respectively, while simultaneously reducing gender bias gaps by 0.1% and 1.4%.

Technology Category

Application Category

📝 Abstract

Speech emotion recognition (SER) systems can exhibit gender-related performance disparities, but how such bias manifests in multilingual speech LLMs across languages and modalities is unclear. We introduce a novel multilingual, multimodal benchmark built on MELD-ST, spanning English, Japanese, and German, to quantify language-specific SER performance and gender gaps. We find bias is strongly language-dependent, and multimodal fusion does not reliably improve fairness. To address these, we propose ERM-MinMaxGAP, a fairness-informed training objective, which augments empirical risk minimization (ERM) with a proposed adaptive fairness weight mechanism and a novel MinMaxGAP regularizer on the maximum male-female loss gap within each language and modality. Building upon the Qwen2-Audio backbone, our ERM-MinMaxGAP approach improves multilingual SER performance by 5.5% and 5.0% while reducing the overall gender bias gap by 0.1% and 1.4% in the unimodal and multimodal settings, respectively.

Problem

Research questions and friction points this paper is trying to address.

gender bias

speech emotion recognition

multilingual

multimodal

fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual speech emotion recognition

gender bias mitigation

MinMaxGAP regularizer

fairness-aware training

multimodal fusion

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

Research Engineer - Multimodal Embodiment Trust (multiple locations)