🤖 AI Summary
This study addresses a critical disconnect in speech emotion recognition research, where stated motivations often emphasize high-stakes applications such as healthcare and intelligent voice systems, yet the datasets and emotion annotations employed fail to substantiate these aims. Through a systematic literature review, the work conducts both qualitative and quantitative analyses of research motivations, dataset selection, and emotion taxonomies, revealing for the first time a structural misalignment between aspiration and practice. The findings highlight how prevailing methodological approaches risk enabling misuse or inappropriate deployment of emotion recognition technologies. The paper calls for a paradigm shift toward use-case-driven research design to foster responsible, ethically grounded, and practically viable advancements in the field.
📝 Abstract
Critical analyses of emotion recognition technology have raised ethical concerns around task validity and potential downstream impacts, urging researchers to ensure alignment between their stated motivations and practice. However, these discussions have not adequately influenced or drawn from research on speech emotion recognition (SER). We address this gap by conducting a systematic survey of SER research to uncover what stated motivations drive this work and if they align with the datasets and emotions studied. We find that while SER research identifies appealing goals, such as well-situated voice-activated systems or healthcare applications, commonly-used datasets do not reflect these proposed deployment contexts, thus presenting a gap between motivations and research practices. We argue that such gaps engender ethical concerns, and that SER research should reassert itself with concrete use-cases to prevent misinterpretations, misuse, and downstream harms.